Fall 2024 Seminar Series : Department of Statistics and Data Science

Fall 2024 Seminar Series

Fall 2024 At a Glance

Friday, October 4 @11:00am
Mathematics in Scientific Machine Learning
Rebecca Willett

Friday, October 11 @11:00am
How to Detect Out-of-Distribution Data in the Wild?
Sharon Y. Li

Friday, October 18 @11:00am
A holistic and critical look at language agents
Yu Su

Friday, October 25 @11:00am
Subsampling for Big Data Regression with Measurement Constraints
Lin Wang

Friday, November 1 @11:00am
Autonomous Learning: Unifying OOD Detection and Continual Learning
Bing Liu

Friday, November 15 @11:00am
Quantum Computation and Statistics
Yazhen Wang

Friday, November 22 @11:00am
Structure-driven design of reinforcement learning algorithms: a tale of two estimators
Wenlong Mou

Department of Statistics and Data Science 2024-2025 Seminar Series - Fall 2024

The 2024-2025 Seminar Series will primarily be in person, but some talks will be offered virtually using Zoom. Talks that are virtual will be clearly designated and registration for the Zoom talks will be required to receive the zoom link for the event. Please email Kisa Kowal at k-kowal@northwestern.edu if you have questions.

Seminar Series talks are free and open to faculty, graduate students, and advanced undergraduate students

Mathematics in Scientific Machine Learning

Friday, October 4, 2024

Time: 11:00 a.m. to 12:00 p.m. central time

Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)

Speaker: Rebecca Willett, Professor of Statistics and Computer Science & the Faculty Director of AI at the Data Science Institute, University of Chicago

Abstract: Artificial intelligence (AI) and machine learning (ML) are poised to revolutionize the pace and nature of scientific discovery. The widespread adoption of AI in the sciences has the potential to integrate scientific inquiry with modes of hypothesis generation, data analysis, experimental design, and simulation, transforming our capacity to address scientific problems that currently seem insurmountable. The mathematical foundations of AI and ML are crucial for high-quality, reproducible, AI-enabled scientific research. However, blindly applying AI and ML poses significant risks, such as the rapid acceleration of the “reproducibility crisis” in science. In this talk, I will discuss fundamental machine learning challenges and opportunities that are particularly relevant to scientific discovery, such as emulators, generative models, and inverse problems. These problems underscore the importance of incorporating mathematical and physical models as well as numerical algorithms into ML frameworks, highlighting exciting directions for future work.

This talk will be given in person on Northwestern's Evanston campus.

This talk co-sponsored by the Department of Engineering Sciences and Applied Mathematics in McCormick

https://planitpurple.northwestern.edu/event/619082

How to Detect Out-of-Distribution Data in the Wild?

Friday, October 11, 2024

Time: 11:00 a.m. to 12:00 p.m. central time

Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)

Speaker: Sharon Y. Li, Assistant Professor in the Department of Computer Sciences, University of Wisconsin Madison

Abstract: When deploying machine learning models in the open and non-stationary world, their reliability is often challenged by the presence of out-of-distribution (OOD) samples. Since data shifts happen prevalently in the real world, identifying OOD inputs has become an important problem in machine learning. In this talk, I will discuss challenges, research progress, and opportunities in OOD detection. Our work is motivated by the insufficiency of existing learning objectives such as ERM --- which focuses on minimizing error only on the in-distribution (ID) data, but does not explicitly account for the uncertainty that arises outside ID data. To mitigate the fundamental limitation, I will introduce a new algorithmic framework, which jointly optimizes for both accurate classification of ID samples and reliable detection of OOD data. The learning framework integrates distributional uncertainty as a first-class construct in the learning process, thus enabling both accuracy and safety guarantees.

This talk will be given in person on Northwestern's Evanston campus.

https://planitpurple.northwestern.edu/event/620540

A holistic and critical look at language agents

Friday, October 18, 2024

Time: 11:00 a.m. to 12:00 p.m. central time

Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)

Speaker: Yu Su, Assistant Professor, Department of Computer Science and Engineering, The Ohio State University

Abstract: How are the contemporary AI agents powered by LLMs different from those of the earlier generations? I argue that their most distinct trait is a new capability of using language as a vehicle of both 'thought' and communication, and therefore they are best called "language agents." I will describe a conceptual framework for these language agents, followed by a more in-depth discussion on several core competencies, including memory, planning, and tool use. I will conclude the talk with interesting future directions.

This talk will be given in person on Northwestern's Evanston campus at the location listed above.

https://planitpurple.northwestern.edu/event/621506

Subsampling for Big Data Regression with Measurement Constraints

Friday, October 25, 2024

Time: 11:00 a.m. to 12:00 p.m. central time

Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)

Speaker: Lin Wang, Assistant Professor of Statistics, Purdue University

Abstract: Despite the availability of extensive data sets, it is often impractical to observe the labels for all data points due to various measurement constraints in many applications. To address this challenge, subsampling approaches can be employed to select a subset of design points from a large pool for observation, resulting in substantial savings in labeling costs. In this presentation, I will introduce our recent research on computationally feasible subsampling techniques. Our primary focus is on regression with labeled data, which includes linear regression, ridge regression, and nonparametric additive regression. For these regression tasks, we have developed sampling approaches that aim to minimize the mean squared error in estimations and predictions. We will demonstrate the effectiveness of our proposed approaches through theoretical analysis and extensive numerical results.

This talk will be given in person on Northwestern's Evanston campus.

https://planitpurple.northwestern.edu/event/621507

Autonomous Learning: Unifying OOD Detection and Continual Learning

Friday, November 1, 2024

Time: 11:00 a.m. to 12:00 p.m. central time

Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)

Speaker: Bing Liu, Distinguished Professor and Peter L. and Deborah K. Wexler Professor of Computing at the University of Illinois Chicago

Abstract: Continual learning (CL) focuses on incrementally learning a sequence of tasks, with class incremental learning (CIL) being one of the most challenging settings. This talk begins by presenting a theoretical study of the CIL problem. The key result is that the necessary and sufficient conditions for effective CIL are strong within-task prediction and reliable out-of-distribution (OOD) detection. The theory unifies CIL and OOD detection, which are regarded as two completely different problems. Building on the theory, new CIL methods have been developed, which significantly outperform existing baselines. However, traditional CIL operates in a closed-world context. We then extend the theory to the open world—where unknown and out-of-distribution objects are encountered—leading to the learning paradigm of open-world CIL, or open-world continual learning (OWCL), enabling autonomous learning. In the last part of the talk, I will discuss challenges in OWCL and present a prototype system that learns on the fly continually and autonomously after deployment.

This talk will be given in person on Northwestern's Evanston campus.

https://planitpurple.northwestern.edu/event/621777

Quantum Computation and Statistics

Friday, November 15, 2024

Time: 11:00 a.m. to 12:00 p.m. central time

Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)

Speaker: Yazhen Wang, Department of Statistics, University of Wisconsin

Abstract: Quantum computation and quantum information are of great current interest across various fields, including computer science, mathematics and statistics, physical sciences and engineering. As the theory of quantum physics is fundamentally stochastic, quantum computation and quantum information are inherently infused with elements of randomness and uncertainty. Consequently, quantum algorithms are random in nature. This highlights the important role for statistics to play in the realm of quantum computation, which in turn offers great potential to revolutionize computational statistics. In this talk, I will provide an overview of quantum computation and statistics, covering the fundamental concepts and exploring quantum advantage along with the role of statistics and the implications for statistics.

This talk will be given in person on Northwestern's Evanston campus.

https://planitpurple.northwestern.edu/event/620875

Structure-driven design of reinforcement learning algorithms: a tale of two estimators

Friday, November 22, 2024

Time: 11:00 a.m. to 12:00 p.m. central time

Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)

Speaker: Wenlong Mou, Assistant Professor of Statistical Sciences, University of Toronto

Abstract: Reinforcement learning (RL) offers a flexible framework for sequential decision-making in uncertain environments, and its success heavily depends on efficiently learning value functions. Over the years, a diverse range of RL algorithms has been proposed, but at their core, two foundational principles stand out: to solve the Bellman fixed-point equations (known as ``bootstrapping methods''), or to average the rollout rewards. Despite their success, finding the optimal trade-off between these principles in practical applications remains elusive. Current theoretical guarantees -- either worst-case or asymptotic -- often fall short of providing actionable insights.

In this talk, I will discuss recent advances in methods that optimally reconcile bootstrapping and rollout for policy evaluation. The bulk of this talk will focus on a new class of estimators that strikes an optimal balance between temporal difference learning and Monte Carlo methods. Through the statistical lens, I will highlight why the local structure of the underlying Markov chain determines the fundamental complexity for estimation, and how our estimator adapts to these structures. Extending this perspective to continuous-time RL, I will also explore how the elliptic structure of diffusion processes provides key insights for making algorithmic choices.

This talk will be given in person on Northwestern's Evanston campus.

https://planitpurple.northwestern.edu/event/621779