Skip to main content

Winter 2025 Seminar Series

Department of Statistics and Data Science 2024-2025 Seminar Series - Winter 2025

The 2024-2025 Seminar Series will primarily be in person, but some talks will be offered virtually using Zoom. Talks that are virtual will be clearly designated and registration for the Zoom talks will be required to receive the zoom link for the event. Please email Kisa Kowal at k-kowal@northwestern.edu if you have questions. 

Seminar Series talks are free and open to faculty, graduate students, and advanced undergraduate students


Leveraging multi-study, multi-outcome data to improve external validity and efficiency of clinical trials for managing schizophrenia

Friday, January 17, 2025

Time: 11:00 a.m. to 12:00 p.m. central time

Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)

Speaker: Caleb H. Miles, Assistant Professor of Biostatistics, Columbia University Mailman School of Public Health

Abstract: As data sources have become more plentiful and readily accessible, the practice of data fusion has become increasingly ubiquitous. However, when the focus is on a causal effect on a particular outcome, a major limitation is that this outcome may not be available in all data sources. In fact, different randomized experiments or observational studies of a common exposure will often focus on potentially related, yet distinct outcomes. One such example is the Database of Cognitive Training and Remediation Studies (DoCTRS), which consists of several randomized trials of the effect of cognitive remediation therapy on various outcomes among patients with schizophrenia. We develop causally principled methodology for fusing data sets when multiple outcomes are observed across studies, which leverages outcomes of secondary interest as informative proxies for the missing outcome of primary interest, thereby maximizing power and efficiency by making full use of the available data. As this methodology relies on a key transportability assumption, we also develop methods to assess the degree of sensitivity to violations of this assumption. We apply this methodology to data from the DoCTRS trials to make improved causal inferences about the effectiveness of cognitive remediation therapy on cognition among patients with schizophrenia.

This talk will be given in person on Northwestern's Evanston campus at the location listed above.

https://planitpurple.northwestern.edu/event/624722

The Role of AI in Scientific Discovery: Opportunities and Limitations

Friday, January 24, 2025

Time: 11:00 a.m. to 12:00 p.m. central time

Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)

Speaker: Xiangliang Zhang, Leonard C. Bettex Collegiate Professor of Computer Science, University of Notre Dame

Abstract: Artificial Intelligence (AI) is reshaping the landscape of scientific discovery, enabling breakthroughs across diverse fields.  However, when these AI tools are applied to scientific problems, gaps and mismatches often arise. The inherent uncertainty in scientific phenomena, coupled with issues like data quality, biases, and interpretability, poses significant challenges. This talk will discuss the transformative potential of AI in scientific discovery, focusing on its applications in predictive modeling, generative tasks, optimization strategies, and literature analysis. Examples will include AI models ranging from traditional neural networks to large language models (LLMs). At the same time, their limitations will be critically examined, calling for collaboration between the AI and scientific communities to address these challenges and unlock AI’s full potential in advancing scientific discovery.

This talk will be given in person on Northwestern's Evanston campus at the location listed above.

https://planitpurple.northwestern.edu/event/624723

TBA

Friday, January 31, 2025

Time: 11:00 a.m. to 12:00 p.m. central time

Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)

Speaker: Yuefeng Han, Assistant Professor, Department of Applied and Computational Mathematics and Statistics, University of Notre Dame

Abstract: TBA

This talk will be given in person on Northwestern's Evanston campus at the location listed above.

TBA

Friday, February 7, 2025

Time: 11:00 a.m. to 12:00 p.m. central time

Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)

Speaker: Tanya Berger-Wolf, Professor, Computer Science and Engineering and Director, Translational Data Analytics Institute

Abstract: TBA

This talk will be given in person on Northwestern's Evanston campus at the location listed above.

Towards Data-efficient Training of Large Language Models (LLMs)

Friday, February 14, 2025

Time: 11:00 a.m. to 12:00 p.m. central time

Location: Virtual talk, registration required (link below)

Speaker: Baharan Mirzasoleiman, Assistant Professor, Computer Science Department, UCLA

Abstract: High quality data is crucial for training LLMs with superior performance. In this talk, I will present two theoretically-rigorous approaches to find smaller subsets of examples that can improve the performance and efficiency of training LLMs. First, I will present a one-shot data selection method for supervised fine-tuning of LLMs. Then, I'll talk about an iterative data selection strategy to pretrain or fine-tune LLMs on imbalanced mixtures of language data. I'll conclude by showing empirical results confirming that the above data selection strategies can effectively improve the performance of various LLMs during fine-tuning and pretraining.

This talk will be given virtually by Zoom. Registration is required to receive the Zoom link for the talk.

Register here

TBA

Friday, February 21, 2025

Time: 11:00 a.m. to 12:00 p.m. central time

Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)

Speaker: 

Abstract: 

This talk will be given in person on Northwestern's Evanston campus at the location listed above.