Date & Time
May 28, 2026, 5 p.m. - May 28, 2026, 8:30 p.m.
Cost
$0
Location
Cambridge, MA
May 28, 2026, 5 p.m. - May 28, 2026, 8:30 p.m.
$0
Cambridge, MA
Hosted by Google Cloud, Red Hat AI, and the llm-d Community
Date: Thursday, May 28th 2026
Join us at Google’s Cambridge office for a deep dive into the latest advancements in open-source distributed inference. This session will focus on the evolution of llm-d, ranging from the upcoming 0.7 release to specialized hardware acceleration on TPUs and NVIDIA GPUs.
Deep technical sessions from llm-d maintainers, committers, and teams using AI at scale
Live demos focused on real distributed workflows
Great networking with food and drinks
ML and Infrastructure Engineers focused on high-throughput serving.
Platform Architects building GenAI stacks on Kubernetes or Cloud.
Open-source contributors interested in the future of distributed LLM orchestration.
Agenda is preliminary and subject to speaker confirmation
5:00pm — Doors Open & Check-In
Security check-in, networking, and light refreshments.
5:30pm — Intro to llm-d & The 0.6 Roadmap
Speaker: Tyler Michael Smith - llm-d Core Maintainer, Red Hat
Topic: An overview of the llm-d 0.7 release, the future of distributed inference, and how the community is evolving to meet the demands of next-gen model architectures.
6:00pm — Achieve state-of-the-art inference: High performance on TPUs and GPUs with llm-d
Speaker: Kaushik Mitra, Google Cloud Engineering
Topic: This session dives deep into how to architect disaggregated serving and automatic key-value cache storage tiering on Ironwood (TPU7x). Learn to implement routing optimized for service-level objectives and build a portable, high-performance inference fleet that scales automatically based on real-time server conditions.
6:30pm — Using llm-d for Efficient Inference at Scale
Speaker: Peter Tanski, Capital One
Topic: Peter will talk about their learnings in implementing llm-d as a central component to solve the challenges of serving open source LLMs at scale: GPU utilization, mixed workloads and efficient inference.
7:00pm — Additional Topics (still TBD)
We are drafting a list of brief updates to cover live:
Inference performance analysis with Prism: https://prism.llm-d.ai, Sean Horgan
KV Cache offloading
TPU 7x overview, Liat Berry
7:30pm — Networking, Food, and Drinks 🍕🤝
Deep-dive conversations with the maintainers and local Boston/Cambridge AI community.
8:30pm — Event Ends