Keynote: Rules of the Road for Shared GPUs: AI Inference Scheduling at Wa... M. Muralikrishnan (ASL)
Kubernetes Gpu scheduling Ai inference Kueue Multi Tenant kubernetes Mlops Wayve Cloud Native Kubernetes scheduling Gpu cluster management
This keynote covers how Wayve manages scheduling and resource allocation for multi-tenant AI inference workloads on Kubernetes using Kueue. It addresses the challenges of running diverse inference workloads—from latency-sensitive evaluation to large-scale synthetic data generation—on shared GPU clusters, and provides practical guidance for platform engineers and MLOps teams dealing with GPU scheduling at scale.