GIS Group of Inductive Statistics


A list of our published paper can be found at the GIS Google Scholar profile (goo.gl/KsjbmE).

Projects:

Bayesian Meta-analysis. The role of meta-analysis is to summarize published studies on a specific problem through statistics. It becomes increasingly important due to the advancement of science and the growth in the number of publications. The synthesis of available information facilitates the understanding and generates more robust conclusions. Glass (1976) defines meta-analysis as an analysis of analysis, that is, a statistical analysis that aims at combining results already found in previous analyses of different studies with the same interest. The meta-analysis combines studies under different conditions, with different levels of precision and research groups from different regions and backgrounds. Thus, conclusions are expected to be broader than those obtained by each of the studies that constitute the systematic (Fagard et al., 1996). The meta-analysis also offers us the opportunity to reconcile differences between regions, countries and groups and presents estimates of average effect combining the results of several studies of the composition.

Estimation of the reliability of coherent systems. The estimation of the reliability of coherent systems is greatly important in engineering, but not limited to it. As far I know, there is not a full solution for the problem yet, and we have been working on this problem and advancing on the state of the art. Many ideas developed for reliability can be used to solve problem in survival analysis. The most famous system in reliability theory is the bridge system (Barlow and Proschan, 1981). It is a coherent system and its properties have been studied and discussed. However, the statistical properties for the estimation of the reliability function and mainly the estimation of the reliability function of its components have not been fully studied yet. Relating this problem with survival analysis, there are important references such as Cox (1972), Breslow and Crowley (1974) and Kaplan and Meier (1958). Examples of developments in competitive risk (series system) are Aalen (1976), Tsiatis (1975), Peterson (1977) and Salinas-Torres et al. (2002). For the parallel system, Polpo and Pereira (2009) presented a Bayesian nonparametric solution to the estimation of the distribution of the components of the system.

Functional data analysis. Functional data are those where the observation is a real function and not just a simple vector or scalar. This kind of problem has become more common since the development of real time measure devices. It is a recent area in statistics and the main used procedures were defined by Ramsay and Dalzell (1991); Ramsay and Silverman (1997). This project was motivated by a physiotherapy problem of the human gait. The human gait is important in order to understand whether the human movement is normal or not and to develop methods to prevent/recover the changes in the normal movement. Olshen et al. (1989) suggested a model to obtain confidence regions to the functional data using the Bootstrap procedure.

Genomic Analysis. A common procedure in Statistics and Machine Learning when dealing with data sets of thousands of variables is to sort all these variables according to some measure that identifies how important they are to predict and/or retrospectively understand a certain target variable (or equivalently an indicator that tells in which group or population belongs each sample). Classical examples of such a procedure are the Students t-test and the Wilcoxons rank-sum u-test (Demsar, 2006; Fay and Proschan, 2010; Mann and Whitney, 1947), whose statistics are often used to sort variables into some order of importance. Arguably, they represent the most commonly used methods for this problem in biomedical applications, in part because of their prompt availability and easiness of use. A typical scenario is to have gene expression data of cancer patients, and a class variable that identifies whether the patient relapsed or not (in other word, whether the cancer came back after treatment/surgery or not). The ability to sort variables in some meaningful order has a range of applications in many fields, and can also be seen as means of performing feature selection (Mitchell, 1997; Witten et al., 2011).

Quantile regression. There are many studies that deal with the quantile regression model under non-parametric and semi-parametric approaches for right-censored data (see, for example, BuHamra et al. (2004); Fung et al. (2012); Koenker (2008); Lin et al. (2012)). Semi-parametric models such as Cox’s (1972) proportional hazards model and linear transformation models (Cheng et al., 1995) are very popular for modeling effects of covariates on a survival response. Several authors, including Ying et al. (1995), gave compelling arguments in favor of focusing on the quantiles of the survival time for modeling and reporting data analysis results. The many semi-parametric and non-parametric approaches are mostly based on self-consistency and martingales, which estimate equations for the median regression (Cheng et al., 1997; Portnoy, 2003; Peng and Huang, 2008). Carroll and Ruppert (1984) and Fitzmaurice et al. (2007) propose parametric versions of a Box-Cox transform-both-sides regression model, considering only uncensored continuous responses, the original Box-Cox transformation, and the normal distribution for the error.

Regression models for categorical data. The development of the generalized linear model brought an important tool to regression of categorical data (Nelder and Wedderburn, 1972). The most popular linkage functions are logit and probit (McCullagh and Nelder, 1989). Many studies discussed the limitations of these symmetrical links. It is well accepted that, when the proportion of a binary response goes to zero differently when it goes to one, a symmetrical link may not be appropriate (Chen et al., 1999). Many parametric classes for link functions were developed. Some works with one parameter link functions are Aranda Ordaz (1981); Chen et al. (1999); Guerrero and Johnson (1982), and with two parameters are Stukel (1988); Prentice (1976); Czado (1994).