Skip to main content

Winter 2026 Seminar Series

Department of Statistics and Data Science 2025-2026 Seminar Series - Winter 2026

The 2025-2026 Seminar Series will primarily be in person, but some talks will be offered virtually using Zoom. Talks that are virtual will be clearly designated and registration for the Zoom talks will be required to receive the zoom link for the event. Please email Kisa Kowal at k-kowal@northwestern.edu if you have questions. 

Seminar Series talks are free and open to faculty, graduate students, and advanced undergraduate students

Towards the Last Mile of Artificial General Intelligence: Open-World Long-Tailed Learning in Theory and Practice

Friday, January 23, 2026

Time: 11:00 a.m. to 12:00 p.m. central time

Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)

Speaker: Dawei Zhou, Assistant Professor, Department of Computer Science, Virginia Tech

Abstract: Artificial General Intelligence (AGI) represents the next generation of AI that can match or exceed human intelligence across a wide spectrum of tasks. Despite remarkable advances, today’s AI systems succeed mainly in data-rich, well-structured settings—identifying common objects, summarizing routine content, or responding to typical queries. They struggle precisely where intelligence matters most—rare, high-stakes, and context-dependent scenarios such as scientific discovery, open-world cybersecurity, and rare disease diagnosis. We argue that this shortfall defines the Last Mile Problem on the path to AGI, which we frame as Open-World Long-Tailed Learning (OpenLT): how can we enable AI systems to reason, adapt, and generalize across the underrepresented, evolving, and open-ended domains? In this talk, I will discuss our group’s recent work on 1) OpenLT Characterization – How can we systematically characterize and uncover novel, complex patterns in open-world data?, 2) OpenLT Adaptation – How can AI models be effectively adapted to open and dynamic environments?, and 3) OpenLT Application and Deployment - hinging on the application of scientific hypothesis generation for 3D metamaterial design to discuss our proposed techniques and theoretical results for open-world long-tailed learning. Finally, I will close with thoughts on how addressing the Last Mile Problem can shape the next decade of AGI research and move us closer to systems that truly understand and operate in the open world.

This talk will be given in person on Northwestern's Evanston campus.

planitpurple.northwestern.edu/event/636717

Graph Neural Network Meets Random Geometric Graph

Friday, February 13, 2026

Time: 11:00 a.m. to 12:00 p.m. central time

Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)

Speaker: Suqi Liu, Assistant Professor, Department of Statistics, University of California, Riverside

Abstract: Graph neural networks (GNNs) have emerged as a powerful framework for learning from graph-structured data, yet their theoretical understanding—particularly regarding the behavior of different architectural choices across various graph-based tasks—remains limited. In parallel, random geometric graphs (RGGs) provide a well-defined probabilistic model that captures the interplay between geometry and connectivity in complex networks. In this talk, I will discuss several efforts I have undertaken to bridge these two perspectives by studying GNNs through the lens of RGGs. In the first part, I will focus on the classic graph matching problem and show that, by leveraging a specific GNN, perfect recovery can be achieved even in high-noise regimes. In the second part, I will briefly highlight recent work demonstrating the provable benefits of graph attention networks (GATs) for a node regression task. This talk is based on joint work with Morgane Austern, Kenny Gu, and Somak Laha.

This talk will be given in person on Northwestern's Evanston campus.

planitpurple.northwestern.edu/event/638864

Deep Survival Learning for Kidney Transplantation: Knowledge Distillation and Data Integration

Friday, February 20, 2026

Time: 11:00 a.m. to 12:00 p.m. central time

Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)

Speaker: Kevin He, Associate Professor of Biostatistics and Associate Director of the Kidney Epidemiology and Cost Center (KECC), University of Michigan

Abstract: Prognostic prediction using survival analysis faces challenges due to complex relationships between risk factors and time-to-event outcomes. Deep learning methods have shown promise in addressing these challenges, but their effectiveness often relies on large datasets. However, when applied to moderate- or small-sized datasets, deep models frequently encounter limitations such as insufficient training data, overfitting, and difficulty in hyperparameter optimization. To mitigate these issues and enhance prognostic performance, this talk presents a flexible deep learning framework that integrates external risk scores with internal time-to-event data through a generalized Kullback–Leibler divergence regularization term. Applied to the national kidney transplant data, the proposed method demonstrates improved prediction of short-term mortality and graft failure following kidney transplantation by distilling and transferring prior knowledge from pre-policy-change teacher models to newly arrived post-policy-change cohorts.

This talk will be given in person on Northwestern's Evanston campus.

 planitpurple.northwestern.edu/event/636129

What functions does XGBoost learn?

Friday, February 27, 2026

Time: 11:00 a.m. to 12:00 p.m. central time

Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)

Speaker: Aditya Guntuboyina, Associate Professor, Department of Statistics, University of California, Berkeley

Abstract: We develop a theoretical framework that explains what kinds of functions XGBoost is able to learn. We introduce an infinite-dimensional function class that extends ensembles of shallow decision trees, along with a natural measure of complexity that generalizes the regularization penalty built into XGBoost. We show that this complexity measure aligns with classical notions of variation—in one dimension it corresponds to total variation, and in higher dimensions it is closely tied to a well-known concept called Hardy–Krause variation. We prove that the best least-squares estimator within this class can always be represented using a finite number of trees, and that it achieves a nearly optimal statistical rate of convergence, avoiding the usual curse of dimensionality. Our work provides the first rigorous description of the function space that underlies XGBoost, clarifies its relationship to classical ideas in nonparametric estimation, and highlights an open question: does the actual XGBoost algorithm itself achieve these optimal guarantees? This is joint work with Dohyeong Ki at UC Berkeley. 

This talk will be given in person on Northwestern's Evanston campus.

planitpurple.northwestern.edu/event/636130