Digital Platform

Unveiled in late 2024, Ironwood is built to serve the 'reasoning' models that have become prevalent in advanced AI. Unlike previous TPUs optimised for both training and inference (mainly), Ironwood is Google's first purpose-built inference optimised accelerator.

The Core Philosophy: 'Reasoning' over Speed

Modern AI models (like OpenAI's o1 or Google's Gemini 2.0 Flash Thinking) do not just generate the first answer they think of. They 'think' before they speak, performing an internal chain-of-thought to check their work and reason through complex problems.

The Problem. This 'thinking' requires massive amounts of computational 'memory bandwidth' (how fast data can be moved in and out of the processor) and 'compute capacity' (how fast the maths can be done).

The Solution. Ironwood is designed to handle the dynamic nature of reasoning. It can process multiple requests in parallel, handle massive context windows (like analysing entire books or codebases at once), and do it all with extreme efficiency.

Architectural Details and Specifications

Performance

Single Chip. A single Ironwood chip delivers 4.6x better peak inference performance than the previous generation (TPU v6).

Pod Performance. When scaled up, a single Ironwood 'Pod' (a supercomputer of interconnected chips) can achieve an astronomical 42.5 Exaflops of peak integer performance. (An Exaflop is a quintillion, a billion billion calculations per second).

Memory Bandwidth

It features a massive 2x improvement in High Bandwidth Memory (HBM) capacity and bandwidth compared to TPU v6. This allows the chip to hold and access the massive 'working memory' required for reasoning models without constantly pausing to fetch data.

Interconnect Speed

Ironwood features a 3x increase in inter-chip interconnect speed. This allows thousands of chips to work together as a single logical accelerator, essential for models that are too large to fit on a single chip.

Key Features and Innovations

Dual Write Gather Scaler (DGS)

This is a critical architectural innovation for inference. Modern AI models, particularly those using Mixture-of-Experts (MoE) architectures, rely heavily on 'gather' operations (fetching data from different parts of the memory). The DGS unit accelerates these specific data movement patterns, ensuring the compute engines never sit idle waiting for data.

Massive Scale Up (The Ironwood Pod)

Google designed Ironwood to scale to enormous sizes. A single Ironwood Pod can consist of 9,216 chips interconnected via a custom high speed optic switching fabric. This allows the cluster to act as one giant accelerator with

7,200 Gbps of bandwidth per chip.

42.5 Exaflops of compute power.

14.4 TB of total High-Bandwidth Memory (HBM).

Energy Efficiency

Despite the raw power, Google emphasises efficiency. Ironwood is designed to be 2x more energy efficient than TPU v6, helping Google meet its carbon neutral goals while running massive AI workloads.

What is Ironwood Used For?

Serving Reasoning Models. Handling the high volume, iterative processing required for models that fact check themselves.

Long Context Understanding. Allowing models to process millions of tokens (e.g. summarising entire movie scripts or massive code repositories) in a single pass thanks to the high memory bandwidth.

Multimodal Generation. Powering models that generate images, video, and audio simultaneously, which requires juggling multiple data types at once.


The Physical Connection. Optical Circuit Switching (OCS)

The Mechanism. Instead of electrons travelling through wires, data is sent as light pulses through fibre optic cables. However, the revolutionary part is that the paths of these light beams are controlled by tiny, moving mirrors.

The 'Beam' Analogy. Uses 9,216 projectors (chips) and 9,216 screens (other chips). If you need to reconfigure which projector talks to which screen, a traditional system requires a technician to physically unplug and replug cables. In Google's system, they just tilt a mirror to bounce the light to a different screen instantly.

This allows Google to dynamically reshape the supercomputer topology depending on the AI model. If a model needs a massive 4,608 chip mesh, the optics reconfigure to create that specific wiring pattern on the fly. This combination of moving mirrors (OCS), 3D wrapping (Torus), and blasing fast optics (7.2 Tbps) is how Ironwood achieves its massive 42.5 Exaflops scale.

‘Calico’ (The Network Topology)

Calico is the optical backplane technology. It is not a single chip, but rather the system of fibre optics and connectors that creates a 'spine and leaf' topology but with light.

Google engineers have described Calico as allowing them to build a 'disaggregated' supercomputer. The memory and compute are not physically glued together; they are connected by Calico's optical fibres, allowing them to mix and match resources as needed. Ironwood's 9,216 chip pod runs on a Calico fabric.

'Chip to Chip Optics' (The Broadcom Partnership)

The Connection. Broadcom produces the SerDes (Serialiser/Deserialise) and DSP chips that handle the signal processing for Google's optics.

Ironwood Context. The 7.2 Tbps speed likely relies on Broadcom’s latest 100G per lane PHY (Physical Layer) technology. While Google designs the MEMS mirrors and the architecture, Broadcom provides the high speed electrical components that drive the lasers.

The 'Roadrunner' (The Cooling Tech)

Photonic devices are very sensitive to heat. Lasers change wavelength if they get too hot. MEMS mirrors can warp.

Google uses a specialised liquid cooling system internally for these optical components, often referred to in leaks as 'Roadrunner' or part of their 'Champ' system.

To keep those co-packaged optics stable right next to a red hot TPU, Google has developed advanced thermal management (cold plates) specifically designed to extract heat from the photonic ICs without disturbing their nanometer scale precision.


© Photonics.institute Maldwyn Palmer