2006: Department of Statistics and Data Science

2006

Fall 2006

Monday, September 25, 2006, 11 am

Speaker: Professor Ji-Ping Wang, Department of Statistics, Northwestern University

Title: Statistical models for nucleosome DNA alignment and linker length prediction in Eukaryotic cells

Abstract: Eukaryotic DNAs exist in a highly compacted form known as chromatin. The nucleosome is the fundamental repeating subunit of chromatin, formed by rapping a short tretch of DNA, 47bp in length, around four pairs of istone proteins. Nucleosome DNA obtained by experiments however varies in ength due to imperfect digestion. We develop a mixture model that haracterizes the known dinucleotide periodicity probabilistically to improve the alignment of nucleosomal DNAs. To further investigate chromatin tructure, we experimentally cloned and sequenced di-nucleosome sequences rom yeast. Each dinucleosome sequence roughly cover two nucleosomes located toward the two ends) with a linker DNA in between. A HMM model is rained based on the nucleosome sequence alignment for prediction of ucleosome positioning. Results show that Eukaryotic cells do favor periodic inker length in chromatin forming on a roughly 10 bp basis.

Monday, October 9, 2006, 11 am

Speaker: Professor Zhigang Zhang, Department of Statistics, Oklahoma State University

Title: A Class of Transformed Mean Residual Life Models with Censored Survival Data

Monday, October 23, 2006, 11 am

Speaker: Dr. Lanju Zhang, Department of Biostatistics, MedImmue Inc.

Title: Response-Adaptive Randomization for Survival Trials: The Parametric Approach

Abstract: Few papers in the literature deal with response-adaptive randomization procedures for survival outcomes and those that do either dichotomize the outcomes or use a nonparametric approach. In this talk, the optimal allocation approach and a parametric response-adaptive randomization procedure are used under exponential and Weibull distributions. The optimal allocations are derived for both distributions and the doubly-adaptive biased coin design is applied to target the optimal allocations. The asymptotic variance of the procedure is obtained for the exponential distribution. The effect of intrinsic delay of survival outcomes is treated. These findings are based on rigorous theory, but also verified by simulation. We illustrate our procedure by redesigning a clinical trial.

Monday, October 30, 2006, 11 am

Speaker: Professor Jan Hannig, Department of Statistics, Colorado State University

Title: Statistical Model for Tracking with Applications

Abstract: We propose a new tracking model that allows for birth, death, splitting and merging of targets. Targets are also allowed to go undetected for several frames. The splitting and merging of targets is a novel addition for a statistically based tracking model. This addition is essential for the tracking of storms, which is the motivation for this work. The utility of this tracking method extends well beyond the tracking of storms. It can be valuable in other tracking applications that have splitting or merging, such as vortexes, radar/ sonar signals, or groups of people. The method assumes that the location of a target behaves like a Gaussian Process when it is observable. A Markov Chain model decides when the birth, death, splitting, or merging of targets takes place. The tracking estimate is achieved by an algorithm that finds the tracks that maximize the conditional density of the unknown variables given the data. The problem of how to quantify the confidence in a tracking estimate is addressed as well. Finally, some sufficient conditions for consistency of this tracking estimate are presented and an almost sure convergence of the tracking estimate to the true path is proved. The practical suitability of this method is then demonstrated on simulated and real data.

Based on a joint work with Thomas C.M. Lee and Curtis B. Storlie.

Monday, November 6, 2006 at 10 am

Speaker: Professor Alfred Rademaker, Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University

Title: The design and analysis of cancer clinical trials

Abstract: Cancer clinical trials run the spectrum from Phase 0 feasibility studies to Phase IV surveillance studies. This talk will focus on statistical methods related to Phase I and Phase II clinical trials. For Phase I, the standard 3+3 design as well as the continual reassessment method will be discussed. For Phase II studies, the Simon 2-stage design will be described, as well as the use of conditional methods for interval estimation of response rate. Other design variations, such as randomized Phase II or combined Phase II/Phase III studies, will also be presented.

Spring 2006

Tuesday, March 7, 2006 at 11 am

Speaker: Ms. Cindy Xin Wang, Department of Statistics, Northwestern University

Title: Gatekeeping Procedures Based on Weighted Bonferroni Tests for Multiple Endpoints in Dose Finding Studies

Abstract: In many dose finding studies there are hierarchically ordered endpoints (e.g., primary, secondary, etc.) and a given dose is compared with a control on any endpoint conditional on the tests on the higher-ordered endpoints being significant (serial gatekeeping). It is required to control the familywise error rate at a designated level taking into account multiplicity of tests. We give a closed procedure (Marcus, Pertiz and Gabriel 1976) for this problem by applying the general and flexible tree-structured testing approach to gatekeeping problems developed in Dmitrienko, Wiens, Tamhane and Wang (2006). The proposed procedure uses weighted Bonferroni tests for testing intersection hypotheses. For an easier implementation of this closed procedure, we give an equivalent stepwise procedure that uses penalized Bonferroni tests for all endpoints except the last, for which it uses a penalized Holm test. The penalty charged at each step of testing is inversely proportional to a so-called rejection gain factor, which depends on the number of rejections at earlier steps and the weights assigned to those rejected hypotheses. The method is applied to an diabetes drug trial data with three endpoints. Extensions in which the Bonferroni test is replaced with the Simes or resampling or the Dunnett test are indicated.

Tuesday, April 4, 2006, 11 am

Speaker: Ms. Yang Ge, Department of Statistics, Northwestern University
Title: On Consistency of Bayesian Inference with Mixtures of Logistic
Regression Models
Abstract: This is a theoretical study of the consistency properties of Bayesian inference using mixtures of logistic regression models. When standard logistic regression models are combined in a ‘mixtures of experts’ set-up, a flexible model is formed to model the relationship between a binary (yes-no) response y and a vector of predictors x. Bayesian inference conditional on the observed data can then be used for regression and classification. This study gives conditions on choosing the number of experts (i.e., number of mixing components) k, or choosing a prior distribution for k, so that Bayesian inference is ‘consistent’, in the sense of ‘often approximating’ the underlying true relationship between y and x. The resulting classification rule is also ‘consistent’, in the sense of having near-optimal performance in classification. We show these desirable consistency properties with a nonstochastic k growing slowly with the sample size n of the observed data, or with a randomk that takes large values with nonzero but small probabilities.

Monday, April 24, 2006 at 3:30 pm

Speaker: Professor Mohsen Pourahmadi, Division of Statistics, Northern Illinois University

Title: Generalized Linear Models for the Covariance Matrix of Longitudinal Data

Abstract: We survey the progress made in modelling covariance matrices from the perspective of generalized linear models (GLM) and show how one can move beyond the use of the identity and logarithmic link functions, and prespecified structures. Observing that most time-domain models (ARMA, state-space,....) in time series analysis are means to diagonalize a Toeplitz covariance matrix via a unit lower triangular matrix (Cholesky decomposition), we discuss the distinguished role of the Cholesky decomposition in providing a systematic and data-based procedure for formulating and fitting parsimonious models for general covariance matrices guaranteeing the positive-definiteness of the estimates. Pulling together some techniques from regression and time series analyses provide the necessary tools for the procedure which reduces the unintuitive task of modelling covariance matrices to that of a sequence of regression models. The procedure is illustrated using a real longitudinal dataset.Once a bona fide GLM framework for modelling covariances is found, its Bayesian, nonparametric, generalized additive and other extensions can be developed in direct analogy with the respective extensions of the traditional GLM.

Tuesday, May 2, 2006 at 11 am

Speaker: Professor Hakan Demirtas, Division of Epidemiology and Biostatistics, University of Illinois at Chicago

Title: Multiple imputation under Bayesianly smoothed random-coefficient hierarchical pattern-mixture models for nonignorably missing longitudinal data
Abstract: Conventional pattern-mixture models can be highly sensitive to model misspecification. In many longitudinal studies, where the nature of the drop-out and the form of the population model are unknown, interval estimates from any single pattern-mixture model may suffer from undercoverage, because uncertainty about model misspecification is not taken
into account. In this talk, I will introduce a new class of Bayesian random coefficient pattern-mixture models to address potentially non-ignorable drop-out. Instead of imposing hard equality constraints to overcome inherent inestimability problems in pattern-mixture models, I propose to smooth the polynomial coefficient estimates across patterns using a hierarchical
Bayesian model that allows random variation across groups. Using real and simulated data, I show that multiple imputation under a three-level linear mixed-effects model which accommodates a random level due to drop-out groups can be an effective method to deal with non-ignorable drop-out by allowing model uncertainty to be incorporated into the imputation process.

Papers that are relevant to this talk:

Demirtas, H. & Schafer, J.L. (2003). On the performance of random-coefficient pattern-mixture models for non-ignorable drop-out.

Statistics in Medicine, 22, 2553-2575.

Demirtas, H. (2004). Modeling incomplete longitudinal data. Journal of Modern Applied Statistical Methods, Volume 3, No 2, 305-321.

Demirtas, H. (2005). Multiple imputation under Bayesianly smoothed pattern-mixture models for non-ignorable drop-out. Statistics in Medicine, 24, 2345-2363.

Demirtas, H. (2005). Bayesian analysis of hierarchical pattern-mixture models for clinical trials data with attrition and comparisons to commonly used ad-hoc and model-based approaches. Journal of Biopharmaceutical Statistics, Volume 15, Issue 3, 383-402.

Tuesday, May 9, 2006 at 11 am

Speaker: Professor Peter Song, Department of Statistics and Actuarial Science,

University of Waterloo

Title: Maximization by Parts in Likelihood Inference
Abstract: In this talk I will present a new algorithm for solving a score equation for the maximum likelihood estimate in certain problems of practical interest. The method circumvents the need to compute second order derivatives of the full likelihood function. It exploits the structure of certain models that yield a natural decomposition of a very complicated likelihood function. In this decomposition, the first part is a log likelihood from a simply analyzed model and the second part is used to update estimates from the first. Convergence properties of this iterative (fixed point) algorithm are examined and asymptotics are derived for estimators obtained by using only a finite number of iterations. I will illustrate several examples in the presentation, including multivariate Gaussian copula models, nonnormal random effects models, generalized linear mixed models, and state space models. Properties of the algorithm and of estimators are discussed in detail via simulation studies on a bivariate copula model and a nonnormal linear random effects model.

Tuesday, May 16, 2006 at 11 am

Speaker: Professor Edward C. Malthouse, Department of Integrated Marketing Communications, Medill School, Northwestern University

Title: Conceptualizing and Measuring Media Engagement and its Effects

Abstract: We propose measuring the latent construct “media engagement” with a third-order confirmatory factor analysis model. The approach is tested using five large reader surveys of 100 and 50 newspapers, 100 magazines, and 39 and 8 media web sites. Over 400 qualitative interviews generated samples of items (questions) from the construct domain for the three media platforms. Consumer surveys measured the items on samples of readers. We used exploratory factor analysis (EFA) to develop scales measuring different dimensions of engagement and confirmatory factor analysis (CFA) to purify the scales further. Additional EFA was used to identify higher-order factors, which were then tested with confirmatory models. We contrast the higher-order factor structure for the three media platforms. Predictive validity is assessed by relating the engagement factors to outcome measures such as usage with random coefficient models and ridge regression. Three quasi-experiments evaluate the effect of engagement on advertising effectiveness.

(This is a joint research with Bobby Calder, Marketing Department, Kellogg.)

Tuesday, May 30, 2006 at 11 am

Speaker: Professor Torben G. Andersen, Kellogg School of Management, Northwestern University and NBER

Title: Continuous-Time Models, Realized Volatilities, and Testable Distributional Implications for Daily Stock Returns

Abstract: We provide a framework for analyzing and understanding daily return distributions within the context of traditional continuous-time asset price processes. We develop a sequence of simple-to-implement distributional tests from transformed inter-daily returns. They hinge on the availability of intraday data for construction of nonparametric realized variation measures and jump detection statistics. Each step speaks to key features of the process underlying the discretely observed prices and should help in developing empirically more realistic models. For thirty large stocks, we find that time-varying diffusive volatility, jumps and leverage effects are all critical in order to describe the dynamic dependencies in the observed prices.

Coauthors:

Tim Bollerslev, Dept. of Economics and Fuqua School, Duke University and NBER

Per H. Frederiksen, Jyske Bank, Denmark

Morten Ø. Nielsen, Dept. of Economics, Cornell University