Seminars

A Year in LLM Serving: Workload Evolution, Caching and Load-Balancing

Jun 29, 2026 12:30 PM·SEC 2.122 & 2.123·William Nixon (University of Chicago)

Designing effective modern LLM serving systems requires an understanding of realistic workloads, but capturing the complexity of today’s diverse applications is difficult using only short traces or synthetic datasets. William will share insights from a comprehensive one-year production trace of billions of LLM requests, exploring how these workloads evolve and detailing key systems implications for prefix caching and load balancing.

Sketches and Their Applications for Synchronization, Blockchain Networks and Learning

Jun 22, 2026 12:30 PM·SEC 2.122 & 2.123·Ori Rottenstreich (Technion)

Hash-based data structures such as Bloom filters are widely used in network systems for a wide range of tasks. In this talk, Ori will overview several recent designs that expand and enhance their utility across multiple domains such as data synchronization among peers, blockchain networks and machine learning pipelines. The talk is based on recent papers from Sigmetrics, TNSM, AFT and APNET.

Systems Seminar: Round Table Discussion

Jun 15, 2026 12:30 PM·SEC 2.122 & 2.123

A new addition to our Systems Seminar: an open round table discussion where people are encouraged to share recent papers, tech news, new tools, or open questions in a relaxed, conversational setting.

Building Scalable Distributed Databases in the Age of Geo-Replication

Jun 8, 2026 12:30 PM·SEC 2.122 & 2.123·Yunhao Mao (University of Toronto)

Modern distributed applications depend heavily on geo-replication for fault tolerance, but high network latency forces these databases to make difficult tradeoffs between the high performance of weak consistency and the data safety of strong consistency. Yunhao will explore solutions to these challenges by detailing advancements in Conflict-free Replicated Datatypes (CRDTs), including the Janus implementation, and introducing Minerva, a scalable transaction protocol designed to maintain high throughput across wide-area networks.

Systems at the Crossroad of Agents & Infrastructure (MLSys ’26 Digest Talk)

Jun 1, 2026 12:30 PM·SEC 2.122 & 2.123·Yiyu Liu (Harvard University)

As Large Language Models transition into autonomous agentic systems, traditional serving frameworks are facing unprecedented performance bottlenecks. Yiyu will deliver a structured digest of cutting-edge research from MLSys 2026 covering agentic AI, LLM systems, and compilers.

Inference and AI Infrastructure (Special Event for #BosTechWeek)

May 26, 2026 9:00 AM·SEC 3.301, 3.302 & 3.303·Venkat Pullela (Keysight) & Tushar Krishna (Georgia Tech)

The market focus has shifted from building LLMs and training to how to serve these models and efficient inference. In this meet up we are discussing the AI inference stack and how optimizing it is a multi dimensional problem, the role of hardware and software co-design and how applications are the only deliverable. We also deep dive into MLCommons Chakra, a framework for capturing AI application execution graphs.

Firefly: Scalable, Ultra-Accurate Clock Synchronization for Datacenters

May 18, 2026 12:30 PM·SEC 2.122 & 2.123·Yuliang Li (Google)

Achieving the sub-10ns clock synchronization required by cloud-based financial exchanges is increasingly difficult because existing methods are often vulnerable to jitter, drift, and the complexities of large-scale network paths. Yuliang will showcase Firefly, a software-driven system that leverages a distributed consensus algorithm and a novel layered synchronization technique to provide resilient, high-precision time alignment across modern datacenters.

High-Dimensional Gradient-Free Optimization for Neuroscience, Interpretability and LLMs: Why It Works and How to Make It Better

Apr 22, 2026 12:45 PM·SEC 2.122 & 2.123·Binxu Wang (Harvard University)

Evolution strategies (ES) provide a vital gradient-free alternative for solving complex, high-dimensional optimization problems in fields like neuroscience and LLM fine-tuning where traditional backpropagation is often unavailable or inefficient. Binxu Wang will explore the geometric properties that enable these methods to succeed and demonstrate how identifying task-irrelevant parameter directions can be leveraged to further accelerate optimization in modern large-scale models.

Collapsing Towers of Interpreters for Security

Apr 15, 2026 12:45 PM·SEC 2.122 & 2.123·Cameron Wong (Harvard University)

Staged metaprogramming offers an alternative to traditional macros by using annotations to control execution timing, allowing programs to be specialized for known inputs by delaying specific computations. Cameron Wong will demonstrate how this technique can derive compilers from interpreters and reify low-level hardware behaviors to detect side-channel vulnerabilities, while also discussing future applications in address sanitization and decompilation.

Practical End-to-End Privacy and Data Use Policy Enforcement

Mar 25, 2026 12:45 PM·SEC 2.122 & 2.123·Malte Schwarzkopf (Brown University)

While modern data is governed by strict privacy policies, developers often lack the practical abstractions needed to ensure their code complies, leading to frequent manual errors and costly violations. Malte Schwarzkopf will introduce Sesame, a framework that utilizes policy containers and static analysis to automate end-to-end privacy enforcement with minimal developer effort and low performance overhead.