Policy-as-Code for LLM Inference: Cost & Security Guardrails - Sakalya Deshpande, SAP Labs

CNCF
AI summary

This talk demonstrates how Kyverno validating admission policies can enforce cost and security guardrails for LLM inference on Kubernetes before workloads reach the scheduler. Sakalya Deshpande shows practical policies that reject inference requests exceeding token budgets, enforce GPU limits on model serving deployments, and require cost-attribution labels for chargeback. Platform engineers building multi-tenant AI infrastructure will learn declarative policy patterns that require no sidecars or custom controllers.