Co-designed for Gemini, open to all
This eighth generation of TPU is also the latest expression of our co-design philosophy, where every spec is built to solve AI’s biggest obstacles.
- Boardfly topology was designed specifically for the communication requirements of today’s most capable reasoning models.
- The SRAM capacity of the TPU 8i was sized for the KV cache footprint for production-scale reasoning models.
- The Virgo Network fabric bandwidth measure was derived from the parallelism requirements for trillion parameter training.
And for the first time, both chips run on Google’s own Axion ARM-based CPU host, allowing us to optimize the entire system, not just the chip, for performance and efficiency.
Both platforms natively support JAX, MaxText, PyTorch, SGLang, and vLLM—the frameworks developers already use—and offer bare-metal access, giving customers direct hardware access without the overhead of virtualization. Open source contributions, including MaxText reference implementations and Tunix for reinforcement learning support, bridge key paths between capability and production deployment.
Design for large-scale power efficiency
In today’s data centers, power, not just chip supply, is a binding constraint. To address this, we’ve optimized efficiency across the entire stack with integrated power management that dynamically adjusts power consumption based on real-time demand. TPU 8t and TPU 8i deliver up to two times better performance per watts compared to the previous generation, Ironwood.
But efficiency at Google isn’t just a chip-level metric; it’s also a system-level commitment that runs from the silicon to the data center. For example, we integrate network connectivity with computing on the same chip, significantly reducing power costs by moving data across the TPU pod. Even our data centers are co-designed with our TPUs. We innovated across hardware and software to enable our data centers to deliver six times more computing power per electric unit than they did just five years ago.
The TPU 8t and TPU 8i continue that trajectory. Both are supported by our fourth generation liquid cooling technology that maintains performance densities that air cooling cannot. By owning the entire stack, from Axion host to accelerator, we can optimize energy efficiency at the system level in ways that simply cannot be achieved when the host and chip are designed independently.
