STATE-Bench - Memory-agnostic Benchmark

AI summary

STATE-Bench is Microsoft's open-source benchmark that evaluates whether memory improves AI agent performance on realistic enterprise tasks in customer support, travel, and shopping domains. Unlike traditional memory benchmarks focused on simple recall, it tests procedural workflows, reliability, efficiency, and user experience—addressing gaps in how AI agent memory is currently measured. The benchmark is memory-agnostic, allowing developers to bring their own memory implementation to assess production readiness.