Keynote: Rules of the Road for Shared GPUs: AI Inference Scheduling at Wa... M. Muralikrishnan (ASL)

CNCF
AI summary

This keynote covers how Wayve manages scheduling and resource allocation for multi-tenant AI inference workloads on Kubernetes using Kueue. It addresses the challenges of running diverse inference workloads—from latency-sensitive evaluation to large-scale synthetic data generation—on shared GPU clusters, and provides practical guidance for platform engineers and MLOps teams dealing with GPU scheduling at scale.