Keynote: Rules of the Road for Shared GPUs: AI Inference Scheduling at Wa... M. Muralikrishnan (ASL)

Name: Keynote: Rules of the Road for Shared GPUs: AI Inference Scheduling at Wa... M. Muralikrishnan (ASL)
Uploaded: 2026-04-26T21:14:38.000Z
Channel: CNCF

Kubernetes Gpu scheduling Ai inference Kueue Multi Tenant kubernetes Mlops Wayve Cloud Native Kubernetes scheduling Gpu cluster management

CNCF April 26, 2026

AI summary

This keynote covers how Wayve manages scheduling and resource allocation for multi-tenant AI inference workloads on Kubernetes using Kueue. It addresses the challenges of running diverse inference workloads—from latency-sensitive evaluation to large-scale synthetic data generation—on shared GPU clusters, and provides practical guidance for platform engineers and MLOps teams dealing with GPU scheduling at scale.

Beyond VLLM: Distributed LLM Inferencing With Llm-d on Kubernetes - Ravindra Patil, Red Hat

The honest practitioner's take on agentic AI on Kubernetes | BRK222

Policy-as-Code for LLM Inference: Cost & Security Guardrails - Sakalya Deshpande, SAP Labs

CNCF On-Demand: Autonomous Agents on K8s – Durable Execution for AI

Merge Forward Meeting - July 2026

ChatLoopBackOff Episode 79: Capsule with Thomas and Oliver

Rook: Intro and Deep Dive With Ceph Storage - Deepika U, Madhu R, Rewant S, Malay P & Pratik S

Keynote: Engineering Population-Scale AI & AI Infrastructure with Cloud Native Tec... Tittu Varghese

Kubernetes Plane Aerodynamics: Breaking the Architectural... Vasu Chandrasekhara & Stefan Schimanski

Keynote: Cloud Native at Bharat Scale, How Rapido Scaled to 7+ Mi... Srivatsa Katta & Adarsh K Kumar

Sponsored Demo: From Local App to Cluster in Minutes with Headlamp and OSS - Sanket Bakshi

Keynote: From Platforms to AI Factories. Has Kubernetes Solved It? - Saiyam Pathak