At the edge, our E2B and E4B models redefine on-device utility, prioritizing multimodal capabilities, low-latency processing, and seamless ecosystem integration over raw parameter count.
Powerful, accessible, open
To power the next generation of cutting-edge research and products, we’ve tailored the Gemma 4 models specifically to run and fine-tune efficiently on hardware – from billions of Android devices worldwide to portable GPUs, right up to developer workstations and accelerators.
Using these highly optimized models, you can fine-tune the Gemma 4 to achieve advanced performance on your specific tasks. We have already seen incredible success with this approach; for example, INSAIT created a pioneering Bulgarian-first-language model (BgGPT), and we worked with Yale University on the Cell2Sentence-Scale to discover new avenues for cancer therapy, among many others.
Here’s what makes the Gemma 4 our most capable open model family yet:
- Advanced Reasoning: Capable of multi-stage scheduling and deep logic, Gemma 4 demonstrates significant improvements in math and instruction-following benchmarks that require it.
- Agent Workflows: Built-in support for function calls, structured JSON output, and native system instructions enable you to build autonomous agents that can interact with various tools and APIs and execute workflows reliably.
- Code Generation: Gemma 4 supports high-quality offline code, turning your workstation into a local-first AI code assistant.
- Sight and sound: All models process built-in video and images, support variable resolutions and excel at visual tasks such as OCR and diagram comprehension. In addition, the E2B and E4B models have native audio input for speech recognition and understanding.
- Longer context: Process long-form content seamlessly. The edge models have a context window of 128K, while the larger models offer up to 256K, so you can send archives or long documents in a single prompt.
- 140+ languages: Originally trained in over 140 languages, Gemma 4 helps developers build inclusive, high-performance applications for a global audience.
Versatile models for different hardware
We’re releasing the Gemma 4 model scales in sizes tailored to specific hardware and use cases, ensuring you get boundary-breaking reasoning wherever you need it:
26B and 31B models: Frontier intelligence, offline on your personal computers
Optimized to provide researchers and developers with state-of-the-art reasoning about available hardware, our unquantized bfloat16 weights fit efficiently on a single 80GB NVIDIA H100 GPU. For local setups, quantized versions run natively on consumer GPUs to power your IDEs, coding assistants, and agent workflows. Our 26B Mixture of Experts (MoE) focuses on latency and activates only 3.8 billion of its total parameters during inference to deliver exceptionally fast tokens per second, while our 31B Dense maximizes raw quality and provides a powerful foundation for fine-tuning.
