Policy-as-Code for LLM Inference: Cost & Security Guardrails - Sakalya Deshpande, SAP Labs
Kyverno Policy As Code Llm inference Kubernetes Gpu management Cloud Native Cost optimization Mlops Admission control Kuberentes security
This talk demonstrates how Kyverno validating admission policies can enforce cost and security guardrails for LLM inference on Kubernetes before workloads reach the scheduler. Sakalya Deshpande shows practical policies that reject inference requests exceeding token budgets, enforce GPU limits on model serving deployments, and require cost-attribution labels for chargeback. Platform engineers building multi-tenant AI infrastructure will learn declarative policy patterns that require no sidecars or custom controllers.