Open Source Distributed AI Inference (llm-d/vLLM) Meetup

Date & Time

May 28, 2026, 5 p.m. - May 28, 2026, 8:30 p.m.

Cost

$0

Location

Cambridge, MA


Sign Up


Description

​Open Source Distributed AI Inference (llm-d/vLLM) Meetup Boston/Cambridge

Hosted by Google Cloud, Red Hat AI, and the llm-d Community

Date: Thursday, May 28th 2026

Event Overview

​Join us at Google’s Cambridge office for a deep dive into the latest advancements in open-source distributed inference. This session will focus on the evolution of llm-d, ranging from the upcoming 0.7 release to specialized hardware acceleration on TPUs and NVIDIA GPUs.

​What to Expect

  • Deep technical sessions from llm-d maintainers, committers, and teams using AI at scale

  • Live demos focused on real distributed workflows

  • Great networking with food and drinks

Who Should Attend

  • ​ML and Infrastructure Engineers focused on high-throughput serving.

  • ​Platform Architects building GenAI stacks on Kubernetes or Cloud.

  • ​Open-source contributors interested in the future of distributed LLM orchestration.

Meetup Agenda

Agenda is preliminary and subject to speaker confirmation

5:00pm — Doors Open & Check-In

​Security check-in, networking, and light refreshments.

5:30pm — Intro to llm-d & The 0.6 Roadmap

  • Speaker: Tyler Michael Smith - llm-d Core Maintainer, Red Hat

  • Topic: An overview of the llm-d 0.7 release, the future of distributed inference, and how the community is evolving to meet the demands of next-gen model architectures.

6:00pm — Achieve state-of-the-art inference: High performance on TPUs and GPUs with llm-d

  • Speaker: Kaushik Mitra, Google Cloud Engineering

  • Topic: This session dives deep into how to architect disaggregated serving and automatic key-value cache storage tiering on Ironwood (TPU7x). Learn to implement routing optimized for service-level objectives and build a portable, high-performance inference fleet that scales automatically based on real-time server conditions.

6:30pm — Using llm-d for Efficient Inference at Scale

  • Speaker: Peter Tanski, Capital One

  • Topic: Peter will talk about their learnings in implementing llm-d as a central component to solve the challenges of serving open source LLMs at scale: GPU utilization, mixed workloads and efficient inference.

7:00pm — Additional Topics (still TBD)

​We are drafting a list of brief updates to cover live:

  • ​Inference performance analysis with Prism: https://prism.llm-d.ai, Sean Horgan

  • ​KV Cache offloading

  • ​TPU 7x overview, Liat Berry

7:30pm — Networking, Food, and Drinks 🍕🤝

  • ​Deep-dive conversations with the maintainers and local Boston/Cambridge AI community.

8:30pm — Event Ends