Self host Gemma 4: Deploy LLMs on Cloud Run GPUs

Name: Self host Gemma 4: Deploy LLMs on Cloud Run GPUs
Uploaded: 2026-04-18T15:47:23.000Z
Channel: Google Cloud Tech

Gemma 4 Cloud run Gpu Ollama Vllm Llm deployment Google cloud Serverless Cloud storage fuse Cloud build Ci/cd Ai inference Machine learning Google cloud platform

Google Cloud Tech April 18, 2026

AI summary

A practical tutorial showing two methods to deploy Google's Gemma 4 LLM on Cloud Run with GPUs: Ollama (containerized model) for fast cold starts but requiring rebuilds for updates, and vLLM with Cloud Storage FUSE for model decoupling at the cost of slower first boot. Both achieve serverless scaling and automated CI/CD through Cloud Build. Ideal for developers building production AI applications on GCP.