Self host Gemma 4: Deploy LLMs on Cloud Run GPUs

Google Cloud Tech
AI summary

A practical tutorial showing two methods to deploy Google's Gemma 4 LLM on Cloud Run with GPUs: Ollama (containerized model) for fast cold starts but requiring rebuilds for updates, and vLLM with Cloud Storage FUSE for model decoupling at the cost of slower first boot. Both achieve serverless scaling and automated CI/CD through Cloud Build. Ideal for developers building production AI applications on GCP.