2016: Department of Statistics and Data Science

2016

Spring 2016

Time: 11am, Wednesday May 4

Place: Basement classroom, Department of Statistics

Speaker: Professor Wei Zhang, Department of Preventive Medicine, Northwestern University

Title: Elucidating the Genetic Basis of Epigenetic Variations for Understanding Human Complex Traits

Abstract: Inter-individual variation in cytosine modifications could contribute to heterogeneity in disease risks and other complex traits. We assessed the population-specificity and genetic architecture of cytosine modifications at 283540 CpG sites in lymphoblastoid cell lines (LCLs) derived from independent samples of European and African descent. Our study suggested that there exists a substantial proportion of population-specific CpGs, which may underlie the health disparities in certain diseases. Cytosine modification variation was found to be primarily controlled in local by single major modification quantitative trait locus (mQTL) and additional minor loci. Local genetic epistasis was detectable for a small proportion of CpG sites, which were enriched by more than nine-fold for CpG sites mapped to population-specific mQTL. About 11% of unique SNPs reported in the Genome-Wide Association Study (GWAS) Catalog with genome-wide significance were annotated, 78% being mQTL and 31% being eQTL (expression quantitative trait loci)in LCLs, which covered 37% of the investigated diseases/traits and provided insights to the biological mechanisms

Time: 11am, Wednesday April 20

Place: Basement classroom, Department of Statistics

Speaker: Professor Kung-Sik Chan, Department of Statistics and Actuarial Science, University of Iowa

Title: Quasi-Likelihood Estimation of a Censored Autoregressive Model with Exogenous Variable

Abstract: Maximum likelihood estimation of a censored autoregressive model with exogenous variables (CARX), for instance the tobit model with autoregressive regression errors, requires computing the conditional likelihood of blocks of data of variable dimensions. As the random block dimension generally increases with the censoring rate, maximum likelihood estimation becomes quickly numerically intractable with increasing censoring. We introduce a new estimation approach using the complete-incomplete data framework with the complete data comprising the observations were there no censoring. We introduce a system of unbiased estimating equations motivated by the complete-data score vector, for estimating a CARX model. The proposed quasi-likelihood method reduces to maximum likelihood estimation when there is no censoring, and it is computationally efficient. We derive the consistency and asymptotic normality of the quasi-likelihood estimator, under mild regularity conditions. We illustrate the efficacy of the proposed method by simulations and a real application on phosphorus concentration in river water.

Time: 11am, Wednesday April 6

Place: Basement classroom, Department of Statistics

Speaker: Danna Zhang, Department of Statistics, University of Chicago

Title: Gaussian Approximation for High Dimensional Time Series

Abstract: I will talk about the problem of approximating sums of high-dimensional stationary time series by Gaussian vectors, using the framework of functional dependence measure. The validity of the Gaussian approximation depends on the sample size $n$, the dimension $p$, the moment condition and the dependence of the underlying processes. I will also introduce an estimator for long-run covariance matrices and discuss about its convergence properties. The results allow constructing simultaneous confidence intervals for mean vectors of high-dimensional time series with asymptotically correct coverage probabilities. A Gaussian multiplier bootstrap method is proposed. A simulation study indicates the quality of Gaussian approximation with different $n$, $p$ under different moment and dependence conditions.

Winter 2016

Time: 2pm, Wednesday March 16
Place: Tech M345
Speaker: Martin Wainwright, University of California

Title: Statistics meets Optimization: Fast randomized Algorithms for Large Data Sets

Abstract: Large-scale data sets are now ubiquitous throughout engineering and science, and present a number of interesting challenges at the interface between statistics and optimization. In this talk, we discuss the use of randomized dimensionality reduction techniques, also known as sketching, for obtaining fast but approximate solutions to large-scale convex programs. Using information-theoretic techniques, we first reveal a surprising deficiency of the most widely used sketching technique. We then show how a simple iterative variant leads to a much faster algorithm, and how it generalizes naturally to a randomized version of the Newton algorithm with provable guarantees.

Time: 11am, Wednesday March 9
Place: Basement classroom, Department of Statistics
Speaker: Dr. Arend Kuyper, Northwestern University

Title: Re-Thinking Teacher Evaluation Based on Student Test Score Data: An Alternative IRT Method

Abstract: The focus of this presentation will be a proposal for an alternative method for teacher evaluation when using student achievement data. Standard methods are limited to either value-added or student growth percentile techniques. The proposed method conceptualizes the process differently which motivates our anchoring of the proposed method in Item Response Theory (IRT). IRT methodologies are well established and already form the basis of most large scale assessments. For example, even though we are unable to directly observe a student’s math ability we can estimate her math ability by observing her responses to a series of well aligned questions. In much the same way, we are unable to directly observe a teacher's instructional ability. Our proposed method requires a set of well aligned questions or items that can be used to estimate a teacher's instructional ability. Our conceptualization would designate the teacher's students as those items. By leveraging well established IRT practices, we can analyze responses to the items (students) and estimate a teacher's instructional ability.

Time: 11am, Feb. 24, Wednesday
Place: Basement classroom, Statistics Department
Speaker: Dr. Desale Habtzghi, Associate Professor of Statistics, DePaul University

Title: Nonparametric estimation of hazard rate function subject to monotonicity, convexity and other shape constraints

Abstract: In this talk, we introduce a new nonparametric method for estimation of the hazard function under shape restrictions. This is an important topic of practical utility because often in survival analysis and reliability applications one has a prior notion about the physical shape of the underlying hazard rate function. At the same time, it may not be appropriate to assume a totally parametric form for it. We adopt a nonparametric approach by assuming that the density and hazard rate have no specific parametric form with the assumption that the shape of the underlying hazard rate is known (either decreasing, increasing, concave, convex or bathtub-shaped). We present an efficient algorithm for computing the constrained estimator. The theoretical justification for the algorithm is provided. We also show how the estimation procedures can be used when dealing with right censored data. We evaluate the performance of the estimator via simulation studies. The applicability and flexibility of our method is illustrated by analyzing real data sets. In addition, our work on the following related topics will be discussed: (1) Estimation of hazard function in the presence of dependent censoring (2) Goodness of fit tests in the presence of shape restrictions on the hazard function (3) Estimation of mean residual life using shape-constrained regression.

Time: Wednesday, 10 February, 11:00
Location: Basement classroom, Department of Statistics, 2006 Sheridan Road, Evanston
Speaker: Timothy E. O'Brien, Professor, Loyola University Chicago

Title: Informative and Efficient Experimental Design Approaches in Applied Research

Abstract: Analysis of multicategory response data in which the multinomial dependent variable is linked to selected covariates includes several rival models. These models include the adjacent category (AC), baseline category logit (BCL), two variants of the continuation ratio (CR), and the proportional odds (PO). For a given set of data, the fits and predictions associated with these various models can vary quite dramatically as can the associated optimal designs (which are then used to estimate the respective model parameters).

Using real datasets, this talk first illustrates fits of these models to various datasets and highlights the associated optimal designs, pointing out the inadequacy of these experimental designs to detect lack-of-fit. We next introduce and illustrate a new generalized logit (GL) model which generalizes all of the above five models, and demonstrate how this GL model can be used to find “robust” optimal designs. These latter designs are thus useful for both parameter estimation and checking for goodness-of-fit. Extensions are also provided for synergy models used in bioassay. Key illustrations are provided as are appropriate software tools.

Time: Wednesday, 13 January, 11:00
Location: Basement classroom, Department of Statistics, 2006 Sheridan Road, Evanston
Speaker: Hao Zhang, Professor and Head of Statistics, Purdue University

Title: The Role of Covariances in Spatial Statistics

Abstract: Covariances play a vital role in spatial statistics because spatial data are correlated in most situations. It is well understood how covariances affect kriging (i.e., the best linear unbiased prediction). Such an understanding helps with the development of statistically and computationally efficient algorithms for estimation and prediction in spatial statistics. However, relatively little is known for cokriging in the multivariate case. In this talk, I will review some interesting and key facts that are known for kriging and present a new result for cokriging and some open problems.