Lectures & Special Events: Department of Statistics and Data Science

Lectures & Special Events

Special Seminar Talk

High-order Singular Value Decomposition

Wednesday, October 12, 2022

Time: 11:00 a.m. to 12:00 p.m. central time

Location: 2006 Sheridan, room B02

Speaker: Anru Zhang, Eugene Anson Stead, Jr. M.D. Associate Professor, Departments of Biostatistics & Bioinformatics, Computer Science, Mathematics, and Statistical Science, Duke University

Abstract: The analysis of high-order data, i.e., arrays with multiple directions, is motivated by a wide range of scientific applications and has become an important interdisciplinary topic in data science. In this talk, we discuss how to perform SVD on general tensors or tensors with structural assumptions, e.g., sparsity, smoothness, and longitudinality. Through the developed frameworks, we can achieve accurate denoising for 4D scanning transmission electron microscopy images; in longitudinal microbiome studies, we can extract key components in the trajectories of bacterial abundance, identify representative bacterial taxa for these key trajectories, and group subjects based on the change of bacteria abundance over time. We also illustrate how we develop new statistically optimal methods and computationally efficient algorithms that exploit useful information from high-dimensional high-order data based on the modern theories of computation and non-convex optimization.

Anru Zhang is the Eugene Anson Stead, Jr. M.D. Associate Professor at the Department of Biostatistics & Bioinformatics and Associate Professor at the Departments of Computer Science, Mathematics, and Statistical Science at Duke University. He obtained his Bachelor's degree in mathematics from Peking University in 2010 and Ph.D. from the University of Pennsylvania in 2015. His work focuses on high-dimensional statistical inference, non-convex optimization, computational complexity, tensor learning, statistical learning theory, and applications in electronic health records, genomics, microbiome, and computational imaging. He won the NSF CAREER Award (2020), ASA Gottfried E. Noether Junior Award (2021), Bernoulli Society New Researcher Award (2021), ICSA Outstanding Young Researcher Award (2021), and the IMS Tweedie Award (2022, awarded by Institute of Mathematical Statistics to a single mid-career statistician/probabilist each year).

This talk will be given in person on Northwestern's Evanston campus at the location listed above.

Past Lectures & Special Events

Sally C. Morton

Lessons Learned as a Statistician

Wednesday, February 19, 2020 Time: 3:30 p.m. Transportation Center, 600 Foster Street

Abstract: Evidence-based decision-making, particularly in the age of big data, demands statisticians not only contribute as scientists but also excel as leaders, communicators and collaborators. In this talk, I will reflect on my own journey as a statistician – opportunities, successes, and missteps. Case studies from the healthcare arena will illustrate the importance of statistical science in policy, as well as underscore the need for statisticians to engage decision-makers and communicate information from a policy perspective while being mindful of the political realities. Based on experience in both industry and academe, I will share lessons learned as a statistical leader by both assignment and influence. The time is now to advance ubiquitous data-driven decisions, and statisticians are particularly well-suited to make the most of this opportunity.

Sally C. Morton is Dean of the College of Science, Interim Director of the Fralin Life Sciences Institute, and Professor of Statistics at Virginia Tech. Her methodological work focuses on evidence synthesis, particularly meta-analysis, and patient-centered comparative effectiveness research. Previously, she was chair of the Biostatistics Department at the University of Pittsburgh, vice president for statistics and epidemiology at RTI International, and head of the RAND Corporation Statistics Group. She is currently a member of the National Center for Health Statistics Board of Scientific Counselors, the National Collaborative on Gun Violence Research Advisory Committee, and the Patient-Centered Outcomes Research Institute Methodology Committee. Dr. Morton served as the 2009 president of the American Statistical Association and received a PhD in statistics from Stanford University.

Cedric Neumann

Quantification of the Probative Value of Pattern and Trace Evidence in Forensic Science

Monday, March 2, 2020
4:00 p.m.
2006 Sheridan Rd, B02

Abstract: Forensic scientists usually perform one of three tasks: (1) reconstruction, where they attempt to infer events that might have taken place at a crime scene; (2) investigation, where they attempt to establish a list of the potential donors of a given trace; and (3) evaluation, where they attempt to determine if a particular trace was made by a specific source.

The determination that a particular trace originates (or not) from a specific source involves considering two mutually exclusive propositions: H_p - the trace originates from the considered source; and H_d - the trace originates from a different source in a population of potential sources. This can be represented as a non-nested model selection problem. In that case, under H_p, the trace is a random sample from the specific source considered by the scientist; under H_d, the trace is a random sample from an unknown source in the considered population.

Forensic science relies on many different types of evidence to support the detection of crime and the identification of criminals. Most people are aware that forensic scientists are concerned with the analysis of fingerprints, DNA, firearms and drugs. It is less well known that forensic scientists are also analysing evidence material such as ear impressions, paint fragments or dust particles. The evaluation of simple DNA evidence is well understood due to the simplicity of the features considered and our ability to develop probabilistic models based on genetics theory. Unfortunately, likelihood-based inference and model selection for pattern and trace evidence involve considering high-dimension heterogenous random vectors which likelihoods do not exist.

During this talk, I will present two methods that we are currently developing and that enable the quantification of the weight of pattern and trace evidence in forensic science. The first method relies on the well-known Approximate Bayesian Computation (ABC) algorithm. Our implementation improves on the various implementations of this algorithm involves by using the Receiver Operating Characteristics (ROC) curve to remove the need to choose a threshold and to alleviate the curse of dimensionality affecting this family of algorithms. The second method involves using kernel functions to express the similarity between pairs of objects as a score, and modelling the distributions of vectors of scores under H_p and H_d. This method can be seen as a probabilistic multi-class version of the well-known Support-Vector Machines. I will present some examples of the application of these techniques to fingerprint and paint evidence.

Cedric Neumann was awarded a PhD in Forensic Science from the University of Lausanne, Switzerland. From 2004 to 2010, Cedric worked at the Forensic Science Service (FSS) in the United Kingdom. As head of the R&D Statistics and Interpretation Research Group, he contributed to the development of the first validated fingerprint statistical model. This model was used to support the admissibility of fingerprint evidence in U.S. courts.

Cedric is currently an Associate Professor of Statistics at the South Dakota State University (SDSU). Cedric's main area of research focuses on the statistical interpretation of forensic evidence, more specifically fingerprint, shoeprint and traces. Cedric has taught multiple workshops for forensic scientists and lawyers alike. Cedric served on the Scientific Working Group for Friction Ridge Analysis, Study and Technology (SWGFAST), was a member of the Board of Directors of the International Association for Identification (IAI) and is the resident statistician of the Chemistry/Instrumental Analysis area committee of the NIST-Organisation of Scientific Area Committees (NIST-OSAC).

Cedric serves on several Editorial Boards, including Forensic Science International and Law, Probability and Risk. Cedric is the 2009 ENFSI Emerging Forensic Scientist and the 2016 SDSU Berg Young Faculty.

Jianqing Fan

Professor of Statistics, Frederick L. Moore '18 Professor of Finance, Princeton University

Communication-Efficient Accurate Statistical Estimation

Wednesday, October 16, 2019
3:00 p.m.
Annenberg Hall G15, 2120 Campus Drive

Abstract

When the data are stored in a distributed manner, direct application of traditional statistical inference procedures is often prohibitive due to communication cost and privacy concerns. This paper develops and investigates two Communication-Efficient Accurate Statistical Estimators (CEASE), implemented through iterative algorithms for distributed optimization. In each iteration, node machines carry out computation in parallel and communicates with the central processor, which then broadcasts aggregated gradient vector to node machines for new updates. The algorithms adapt to the similarity among loss functions on node machines, and converge rapidly when each node machine has large enough sample size. Moreover, they do not require good initialization and enjoy linear converge guarantees under general conditions. The contraction rate of optimization errors is derived explicitly, with dependence on the local sample size unveiled. In addition, the improved statistical accuracy per iteration is derived. By regarding the proposed method as a multi-step statistical estimator, we show that statistical efficiency can be achieved in finite steps in typical statistical applications. In addition, we give the conditions under which one-step CEASE estimator is statistically efficient. Extensive numerical experiments on both synthetic and real data validate the theoretical results and demonstrate the superior performance of our algorithms.

(Joint work with Yongyi Guo and Kaizheng Wang)

Jianqing Fan is a statistician, financial econometrician, and data scientist. He is Frederick L. Moore '18 Professor of Finance, Professor of Statistics, and Professor of Operations Research and Financial Engineering at the Princeton University. He is the winner of The 2000 COPSS Presidents' Award, Morningside Gold Medal for Applied Mathematics (2007), Guggenheim Fellow (2009), Pao-Lu Hsu Prize (2013) and Guy Medal in Silver (2014). He got elected to Academician from Academia Sinica in 2012. Fan’s research interests include statistical theory and methods in data science, statistical machine learning, finance, economics, computational biology, biostatistics with particular skills on high-dimensional statistics, nonparametric modeling, longitudinal and functional data analysis, nonlinear time series, and wavelets. Read more about him at: https://orfe.princeton.edu/~jqfan/

Essential Concepts of Causal Inference - A Remarkable History

Donald B. Rubin
John L. Loeb Professor of Statistics, Harvard University

Tuesday, December 5, 2017
3:00 p.m.
Transportation Center, 600 Foster Street

Reception to follow

Abstract
I believe that a deep understanding of cause and effect, and how to estimate causal effects from data, complete with the associated mathematical notation and expressions, only evolved in the twentieth century. The crucial idea of randomized experiments was apparently first proposed in 1925 in the context of agricultural field trails but quickly moved to be applied also in studies of animal breeding and then in industrial manufacturing. The conceptual understanding seemed to be tied to ideas that were developing in quantum mechanics. The key ideas of randomized experiments evidently were not applied to studies of human beings until the 1950s, when such experiments began to be used in controlled medical trials, and then in social science — in education and economics. Humans are more complex than plants and animals, however, and with such trials came the attendant complexities of non-compliance with assigned treatment and the occurrence of Hawthorne and placebo effects. The formal application of the insights from earlier simpler experimental settings to more complex ones dealing with people, started in the 1970s and continue to this day, and include the bridging of classical mathematical ideas of experimentation, including fractional replication and geometrical formulations from the early twentieth century, with modern ideas that rely on powerful computing to implement aspects of design and analysis.

Donald B. Rubin is the John L. Loeb Professor of Statistics at Harvard University. Professor Rubin’s work in causal inference, missing data, matching and applied Bayesian inference essentially defined new fields of statistics. His methods are now embedded in statistical software used by virtually all empirical scientists, and his books “Statistical Analysis with Missing Data,” “Multiple Imputation for Nonresponse in Surveys,” “Matched Sampling for Causal Effects” and “Applied Bayesian Inference” are essential reference works. Read more about him at: statistics.fas.harvard.edu/people/donald-b-rubin

RSVP at: RubinDec52017