Evolving KServe: The Unified Model Inference Platform for Both Predictive and... F. Spolti & J. Lee
Kserve Model serving Kubernetes Machine learning Mlops Llm inference Generative ai Cncf Envoy ai gateway Cloud native Ml infrastructure
This CNCF talk explores KServe's evolution from a predictive AI serving platform to a unified solution for generative AI workloads. The session covers production challenges for LLMs including inference efficiency, distributed execution, and cost optimization, while introducing new features like the llm-d CRD for LLM serving and disaggregated inference architectures. Ideal for ML platform engineers and cloud architects building model serving infrastructure on Kubernetes.