2010: Department of Statistics and Data Science

2010

Fall 2010

Wednesday Nov 10, 11am

Speaker: Brent Logan, Associate Professor

Division of Biostatistics, Medical College of Wisconsin

Title: Marginal Models for Clustered Time-to-Event Data with Competing Risks Using Pseudovalues

Abstract:
Many time-to-event studies are complicated by the presence of competing risks and by nesting of individuals within a cluster, such as patients in the same center in a multicenter study. Several methods have been proposed for modeling the cumulative incidence function with independent observations. However, when subjects are clustered, one needs to account for the presence of a cluster effect either through frailty modeling of the hazard or subdistribution hazard, or by adjusting for the within-cluster correlation in a marginal model. We propose a method for modeling the marginal cumulative incidence function directly. We compute leave-one-out pseudo-observations from the cumulative incidence function at several time points. These are used in a generalized estimating equation to model the marginal cumulative incidence curve, and obtain consistent estimates of the model parameters. A sandwich variance estimator is derived to adjust for the within-cluster correlation. The method is easy to implement using standard software once the pseudovalues are obtained, and is a generalization of several existing models. Simulation studies show that the method works well to adjust the SE for the within-cluster correlation. We illustrate the method on a dataset looking at outcomes after bone marrow transplantation.

Wednesday Oct 27, 11am

Speaker: Dabao Zhang, Associate Professor

Department of Statistics, Purdue University

Title: Penalized Orthogonal-Components Regression for Selecting Sparse Variables in High-Dimensional Data

Abstract: We propose a penalized orthogonal-components regression (POCRE) to select variables from high-dimensional low-sample-size data. Orthogonal components are sequentially constructed to maximize, upon standardization, their correlation to the response residuals. A new penalization framework, implemented via empirical Bayes thresholding, is presented to effectively identify sparse predictors of each component. POCRE is computationally efficient owing to its sequential construction of leading sparse principal components. In addition, such construction offers other properties such as grouping highly correlated predictors and allowing for collinear or nearly collinear predictors. With multivariate responses, POCRE can construct
common components and thus build up latent-variable models for high dimensional data.

Wednesday Oct 13, 11am

Speaker: Liping Tong, Assistant Professor

Department of Mathematics and Statistics, Loyola University

Title:Co-evolution Model for Dynamic Social Network and Behavior

Abstract:An individual's behaviors may be influenced by the behaviors of friends, such as hours spent watching television, playing sports, and unhealthy eating habits. However, preferences for these behaviors may also influence the choice of friends; for example, two children who enjoy playing the same sport are more likely to become friends. To study the interdependence of social network and behavior, Snidjers et al. has developed the stochastic actor based modeling (SABM) methods for the co-evolution process, which turns out to be useful when dealing with longitudinal social network and behavior data when behavior variables are discrete and have limited number of possible values. Unfortunately, since the evolution function for behavior variable is in exponential format, the SABM can generate unrealistic results when the behavior variable is continuous or has a large range. To realistically model continuous behavior variable, we propose a co-evolution process so that the network evolution is based on an exponential random graph model and the behavior evolution is based on a linear model.

Wednesday Sep 29, 11am

Speaker: Mihails Levins, Associate Professor

Department of Statistics, Purdue University

Title: An EM algorithm for estimation of finite nonparametric mixtures of multivariate densities

Abstract: We propose an algorithm for nonparametric estimation of finite mixtures of multivariate random vectors that is a true EM algorithm. The vectors are assumed to have independent coordinates conditional upon knowing the mixture component they come from, but otherwise their density functions are completely unspecified.
To the best of our knowledge, this is the first algorithm for estimation of nonparametric mixtures that specifies an explicit likelihood function and has a verifiable ascent property. We show that, under reasonable regularity conditions, there exists a unique solution of the likelihood maximization problem that specifies an algorithm and that the proposed algorithm indeed converges to that solution. The algorithm can be applied to a mixture with any finite number of components and any dimensionality of the underlying random vectors. Performance of the algorithm is illustrated using both simulated and real data.

Spring 2010

Wednesday May 26, 11am

Speaker: Leland Wilkinson, Professor

SYSTAT

Department of Computer Science, University of Illinois at Chicago

Dept. of Statistics, Northwestern University

Title: Linf: An L-infinity Classifier

Abstract: We introduce a classifier based on the L-infinity norm. This classifier, called Linf, is a composition of four stages (transforming, projecting, binning, and covering) that are designed to deal with the curse of dimensionality, computational complexity, and nonlinear separability. Linf is not a hybrid or modification of existing classifiers; it employs a new covering algorithm. The accuracy of Linf on widely-used benchmark classification datasets exceeds, on average, the accuracy of competitive classifiers such as Support Vector Machines and Random Forests. Its computational complexity is sub-linear in number of instances and number of variables and subquadratic in number of classes. This is work with Anushka Anand and Dang Nhon Tuan, Dept. of Computer Science, University of Illinois at Chicago.

Wednesday May 5, 11am

Speaker: Jing Wang, Assistant Professor of Statistics
University of Illinois at Chicago

Title: Efficient and fast spline-backfitted kernel smoothing of additive
models
Abstract: A great deal of effort has been devoted to the inference of
additive model in the last decade. Among existing procedures, the kernel
type are too costly to implement for high dimensions or large sample
sizes, while the spline type provide no asymptotic distribution or uniform convergence. We propose a one step backfitting estimator of the component function in an additive regression model, using spline estimators in the first stage followed by kernel/local linear estimators. Under weak conditions, the proposed estimator's pointwise distribution is asymptotically equivalent to an
univariate kernel/local linear estimator, hence the dimension is effectively reduced to one at any point. This dimension reduction holds uniformly over an interval under assumptions of normal errors. Monte Carlo evidence supports the asymptotic results for dimensions ranging from low to very high, and sample sizes ranging from moderate to large. The proposed confidence band is applied to the Boston housing data for linearity diagnosis. This paper is a joint work with Professor Lijian Yang at Michigan State University.

Wednesday April 21, 11am

Speaker: Xuming He, Professor of Statistics
University of Illinois at Urbana-Champaign

Title: On Dimensionality of Mean Structure from a Single Data Matrix

Abstract:We consider inference from data matrices that have low dimensional mean structures. In educational testing and in probe-level microarray data, estimation and inference are often made from a single data matrix believed to have a uni-dimensional mean structure. In this talk, we focus on probe-level microarray data to examine the adequacy of a uni-dimensional summary for characterizing the data matrix of each probe-set. To do so, we propose a low-rank matrix model, and develop a useful framework for testing the adequacy of uni-dimensionality against targeted alternatives. We analyze the asymptotic properties of the proposed test statistics as the number of rows (or columns) of the data matrix tends to infinity, and use Monte Carlo simulations to assess their small sample performance. Applications of the proposed tests to GeneChip data show that evidence against a uni-dimensional model is often indicative of practically relevant features of a probe-set. (Part of the talk is based on ongoing work of Xingdong Feng, a doctoral student at the University of Illinois.)

Wednesday April 7, 11am

Speaker: Fangfang Wang, Assistant Professor,
Department of Information and Decision Sciences
University of Illinois at Chicago

Title: The HYBRID GARCH Class of Models

Abstract: We propose a general GARCH framework that allows the use of different frequency returns to model conditional heteroskedasticity. We call the class of models High FrequencY Data-Based PRojectIon-Driven (HYBRID) GARCH models as the GARCH dynamics are driven by what we call HYBRID processes. We study three broad classes of HYBRID processes: (1) parameter-free processes that are purely data-driven, (2) structural HYBRIDs where one assumes an underlying DGP for the high frequency data and finally (3) HYBRID filter processes. We develop the asymptotic theory of various estimators and study their properties in small samples via simulations. This is work with Xilong Chen and Eric Ghysels.

Winter 2010

Wednesday March 10, 2010, 11am

Speaker: Stephen Stigler, Professor of Statistics, University of Chicago

Title:Darwin, Galton, and the Statistical Enlightenment

Abstract: Francis Galton invented multivariate analysis in 1885. The main outline of that advance is fairly well known, but the link to Darwin has been inadequately studied. I will discuss that link and a hitherto unnoticed major step in that development, and I will tell how this advance led to a remarkable 50 year period that might justly be called the Statistical Enlightenment, a period that included the reinvention of Bayesian inference. Galton's algorithm for simulating posterior distributions will be highlighted. A previously unsung hero of the story, working in Lake Forest Illinois in 1889, will be saluted.

Wednesday Feb 24, 2010, 11am

Speaker: Dale Rosenthal, Assistant Professor of Finance, UIC

Title: A Network Model of Counterparty Risk

Abstract:
Two network structures of contracts on a risky asset are explored in a two-period model. One structure represents a bilaterally-cleared OTC market, the other represents a centrally-cleared market. An exogenous bankruptcy occurs before period one inducing counterparties to trade with price impact. The two different market structures are shown to yield different price impact and volatility. Further, market-induced bankruptcy of a large (financial) firm is shown to yield two undesirable phenomena in bilateral markets: checkmate and hunting. Checkmate occurs when a counterparty cannot expect to prevent impending bankruptcy. The other occurs when counterparties push markets further than a central counterparty would, inducing further bankruptcies. These counterparties may even expect to profit from such follow-on bankruptcies. The results suggest that bilateral OTC markets have externalities (larger distress volatility) which can be priced relative to centrally-cleared markets. This might offer guidance on when and how much incentive to offer for markets to transition from one structure to another. The results also suggest that in times of distress coordination by market authorities has value.

Wednesday Feb 10, 2010, 11am

Speaker: Zhengjun Zhang, Assistant Professor of Statistics, University of Wisconsin at Madison

Title: Examining Extremal Dependence in Continental USA Climate Data

Abstract: In recent years, extremal climatic conditions are more often observed, where part of climate variables are not only dependent, but also extremely dependent. Identification of extremal dependence among observations is challenging and remains an open problem. This talk introduces a class of tail quotient correlation coefficients (TQCC) which allows the underlying threshold values to be random. The limit distribution of the TQCC under the null hypothesis of extremal independence is derived. Test statistics for extremal independence are constructed and shown to be consistent under the alternative hypothesis of extremal dependence. Motivated by TQCC, the talk introduces a broader class of nonlinear quotient correlation coefficients (NQCC) for characterizing nonlinear dependence between random variables. We apply TQCC and NQCC to investigate extremal dependence and nonlinear dependence of daily precipitations in US during 1950--1999 recorded at 5873 stations from the National Climate Data Center rain gauge data. Our results indicate nonstationarity, spatial clusters, and extremal dependence and nonlinear dependence in the data. They provide useful information for next generation climate models.