Open Source Distributed AI Inference (llm-d/vLLM) Meetup

Name: Open Source Distributed AI Inference (llm-d/vLLM) Meetup
Start: 2026-05-28T17-00
End: 2026-05-28T20-30

Date & Time

May 28, 2026, 5 p.m. - May 28, 2026, 8:30 p.m.

Cost

$0

Location

Cambridge, MA

Organizer

Open Source Distributed AI Inference (llm-d/vLLM) Meetup Boston/Cambridge

Hosted by Google Cloud, Red Hat AI, and the llm-d Community

Date: Thursday, May 28th 2026

Event Overview

Join us at Google’s Cambridge office for a deep dive into the latest advancements in open-source distributed inference. This session will focus on the evolution of llm-d, ranging from the upcoming 0.7 release to specialized hardware acceleration on TPUs and NVIDIA GPUs.

What to Expect

Deep technical sessions from llm-d maintainers, committers, and teams using AI at scale
Live demos focused on real distributed workflows
Great networking with food and drinks

Who Should Attend

ML and Infrastructure Engineers focused on high-throughput serving.
Platform Architects building GenAI stacks on Kubernetes or Cloud.
Open-source contributors interested in the future of distributed LLM orchestration.

Meetup Agenda

Agenda is preliminary and subject to speaker confirmation

5:00pm — Doors Open & Check-In

Security check-in, networking, and light refreshments.

5:30pm — Intro to llm-d & The 0.6 Roadmap

Speaker: Tyler Michael Smith - llm-d Core Maintainer, Red Hat
Topic: An overview of the llm-d 0.7 release, the future of distributed inference, and how the community is evolving to meet the demands of next-gen model architectures.

6:00pm — Achieve state-of-the-art inference: High performance on TPUs and GPUs with llm-d

Speaker: Kaushik Mitra, Google Cloud Engineering
Topic: This session dives deep into how to architect disaggregated serving and automatic key-value cache storage tiering on Ironwood (TPU7x). Learn to implement routing optimized for service-level objectives and build a portable, high-performance inference fleet that scales automatically based on real-time server conditions.

6:30pm — Using llm-d for Efficient Inference at Scale

Speaker: Peter Tanski, Capital One
Topic: Peter will talk about their learnings in implementing llm-d as a central component to solve the challenges of serving open source LLMs at scale: GPU utilization, mixed workloads and efficient inference.

7:00pm — Additional Topics (still TBD)

We are drafting a list of brief updates to cover live:

Inference performance analysis with Prism: https://prism.llm-d.ai, Sean Horgan
KV Cache offloading
TPU 7x overview, Liat Berry

7:30pm — Networking, Food, and Drinks 🍕🤝

Deep-dive conversations with the maintainers and local Boston/Cambridge AI community.

8:30pm — Event Ends

Find a Job Get Matched to Companies Browse Organizations Find a Remote Tech Job Our Impact Areas

Post a Job Create Company Profile Search Candidate Database Pricing

About Us Upcoming Events Community Blog LinkedIn Bluesky

Tech Jobs for Good helps mission-driven companies hire the best tech talent in the United States.

Contact Us Terms and Conditions Privacy Policy

Open Source Distributed AI Inference (llm-d/vLLM) Meetup

Date & Time

Cost

Location

Organizer

Description

Open Source Distributed AI Inference (llm-d/vLLM) Meetup Boston/Cambridge

Event Overview

What to Expect

Who Should Attend

Meetup Agenda

For Job Seekers

For Companies

Connect with Us

About Tech Jobs for Good

Open Source Distributed AI Inference (llm-d/vLLM) Meetup

Date & Time

Cost

Location

Organizer

Description

​Open Source Distributed AI Inference (llm-d/vLLM) Meetup Boston/Cambridge

​Event Overview

​What to Expect

​Who Should Attend

​Meetup Agenda

For Job Seekers

For Companies

Connect with Us

About Tech Jobs for Good

Open Source Distributed AI Inference (llm-d/vLLM) Meetup Boston/Cambridge

Event Overview

What to Expect

Who Should Attend

Meetup Agenda