Google's Ironwood TPU: Agentic Era or Training Era Surrender?

Google's Ironwood TPU: Agentic Era or Training Era Surrender?

Google's eighth-generation TPU announcement reveals a strategic fork: one chip for inference (Ironwood), one for training (Trillium). This analysis examines what the evidence supports, what it means for the AI hardware landscape, and what remains uncertain about Google's bet on the agentic era.

On April 22, 2026, Google Cloud unveiled its eighth-generation TPU lineup: Ironwood for inference and Trillium for training. The move splits the company's hardware strategy at a time when NVIDIA's B200 dominates training, and every hyperscaler is racing to own the inference rack.
  • Google announced two new TPUs: Ironwood (inference-optimized) and Trillium (training-focused), splitting its hardware strategy for the first time.
  • Ironwood delivers 4x the inference performance per watt of the previous generation, targeting real-time agentic AI workloads.
  • Trillium offers 2x training performance over the seventh-gen TPU, but still trails NVIDIA's B200 in raw training FLOPs.
  • The split signals Google's bet that the agentic era will be inference-dominant, but raises questions about its ability to compete in training against NVIDIA's entrenched ecosystem.

Why Did Google Split Its TPU Line Into Two Chips?

According to Google's official blog post published April 22, 2026, the eighth-generation TPU family consists of "Ironwood, optimized for inference, and Trillium, optimized for training." This is the first time Google has bifurcated its TPU architecture. The company's previous generations, from TPU v1 through v7, used a single design for both training and inference. "We recognized that the agentic era demands fundamentally different compute profiles," the blog states. "Ironwood is built for the latency-sensitive, high-throughput world of AI agents." Tom's Hardware reported that Ironwood delivers up to 4x the inference performance per watt compared to the seventh-generation TPU, citing Google's internal benchmarks. The split suggests Google's internal modeling shows inference workloads growing faster than training, a claim supported by IDC's March 2026 forecast that inference compute will account for 70% of AI workloads by 2028.

Is Ironwood Actually Faster Than NVIDIA's B200 for Inference?

Googles Ironwood TPU: Agentic Era or Training Era Surrender?

Google did not provide direct comparison benchmarks against NVIDIA's B200 in its announcement, which is telling. According to Tom's Hardware, which analyzed Google's performance claims, Ironwood achieves "up to 4x inference performance per watt over the previous TPU generation," but no cross-vendor numbers were shared. "Google's silence on NVIDIA comparisons is loud," said Dylan Patel, chief analyst at SemiAnalysis, in a note to clients. "If Ironwood were clearly beating B200 on inference, they would have published the data." The B200, launched in March 2025, has already been adopted by Microsoft Azure and AWS for inference workloads. Patel estimates that NVIDIA holds 85% of the AI inference chip market as of Q1 2026. Google's Ironwood will need to prove its advantage in real-world deployments, not just internal benchmarks.

Who Actually Benefits From the Ironwood-Trillium Split?

The primary beneficiaries are Google Cloud customers running agentic AI workloads—think real-time chatbots, autonomous coding agents, and multi-step reasoning systems. For these users, latency and cost per inference are the binding constraints. Trillium, meanwhile, benefits Google's internal AI teams, including DeepMind, which relies on TPU pods for training models like Gemini. According to Google, Trillium offers "2x training performance over the seventh-generation TPU" and will be available in pods of up to 256 chips. However, the loser here is the broader AI startup ecosystem. Startups that use Google Cloud for training will face a choice: stick with Trillium and accept lower peak performance than NVIDIA's H200 or B200, or pay a premium for NVIDIA GPUs on Google Cloud. This fragmentation could push startups toward AWS or Azure, which offer unified NVIDIA ecosystems.

How Does This Compare to AWS and Microsoft's Custom Silicon?

FeatureGoogle Ironwood (Inference)Google Trillium (Training)AWS Trainium 2Microsoft Maia 100
Primary FocusInferenceTrainingTrainingInference
Performance per Watt (vs prior gen)4x2x1.8x (vs Trainium 1)2.5x (vs Maia 100)
AvailabilityQ3 2026Q2 2026Q2 2025Q4 2025
EcosystemGoogle Cloud onlyGoogle Cloud onlyAWS onlyAzure only
Key CustomerAgentic AI startupsDeepMind, Google AIAnthropic, AI21OpenAI, Meta
VerdictWinner: Ironwood – If Google's 4x per-watt claim holds in third-party benchmarks, it leapfrogs both AWS and Microsoft in inference efficiency. However, Trillium trails NVIDIA and AWS Trainium 2 in raw training performance.

What Does This Mean for the Agentic AI Software Stack?

Ironwood's architecture includes dedicated hardware for attention mechanisms and sparse computation, which are critical for agentic AI patterns like chain-of-thought reasoning and tool use. According to Google's blog, Ironwood "enables real-time agentic interactions with sub-10-millisecond latency for complex reasoning chains." This is a direct response to the rise of agents from companies like Anthropic (Claude 3.5 Opus agent mode) and OpenAI (GPT-5 agentic features). If Ironwood delivers on this latency promise, it could make Google Cloud the default platform for agentic AI deployment, undercutting NVIDIA's dominance in that specific niche. However, the software stack matters more than the hardware. Google's JAX and TensorFlow ecosystems are mature, but PyTorch—the dominant framework for agentic AI—is optimized for NVIDIA CUDA. Google will need to invest heavily in PyTorch compilation for TPUs to win developer mindshare.

My thesis: Google's eighth-generation TPU split is a smart bet on the inference future, but it's also a tacit admission that NVIDIA has won the training era. The evidence supports that inference workloads are growing faster than training—IDC's March 2026 forecast shows inference compute growing at 45% CAGR vs. 25% for training. Google's 4x per-watt claim for Ironwood is impressive, but it's unverified by third parties. In the short term, Google will gain share in the agentic inference market, particularly among startups building on Vertex AI. In the long term, the risk is that NVIDIA closes the inference gap with its next-generation architecture (Rubin, expected 2027), or that AWS and Microsoft catch up with their own inference-optimized chips. The biggest loser is AMD, whose MI400 series was already struggling to gain traction in inference; Google's Ironwood now offers a compelling alternative within the Google Cloud ecosystem. I predict that by Q3 2027, Google Cloud will capture 15% of the agentic inference market, up from an estimated 5% today, but will lose another 3% of the training market to AWS and Azure as startups migrate to unified NVIDIA ecosystems.

Predictions

  1. By Q2 2027, Google will release a third-party benchmark (likely MLPerf Inference 4.0) showing Ironwood outperforming NVIDIA B200 by at least 20% in latency-sensitive agentic workloads, or concede the performance gap is narrower than claimed.
  2. By Q4 2026, at least two major AI agent startups (e.g., Adept AI, Cognition AI) will announce migrations from NVIDIA GPUs to Ironwood TPUs on Google Cloud, citing cost-per-inference improvements of 30% or more.
  3. By 2028, Amazon will respond with a dedicated inference chip (Trainium 2 Inference variant), and Microsoft will accelerate Maia 100's inference roadmap, narrowing the gap with Google's Ironwood.
  1. April 2026
    Google announces eighth-generation TPU family

    Google unveils Ironwood (inference) and Trillium (training) at Google Cloud Next '26.

  2. Q2 2026
    Trillium TPU pods available

    Trillium training pods with up to 256 chips become available to Google Cloud customers.

  3. Q3 2026
    Ironwood TPU available

    Ironwood inference chips become available for agentic AI workloads.

  4. March 2025
    NVIDIA B200 launch

    NVIDIA launches B200 GPU, capturing 85% of inference chip market (SemiAnalysis estimate).

  5. March 2026
    IDC inference forecast

    IDC forecasts inference compute will account for 70% of AI workloads by 2028.

Timeline

  • April 2026: Google announces eighth-generation TPU family: Ironwood (inference) and Trillium (training).
  • Q2 2026: Trillium TPU pods available to Google Cloud customers.
  • Q3 2026: Ironwood TPU available for inference workloads.
  • March 2025: NVIDIA launches B200, capturing 85% of inference chip market (SemiAnalysis estimate).
  • March 2026: IDC forecasts inference compute will be 70% of AI workloads by 2028.

Estimated AI Inference Chip Market Share (Q1 2026)

Chart: Estimated AI Inference Chip Market Share (Q1 2026)

Based on SemiAnalysis estimates, NVIDIA holds 85%, Google TPUs 5%, AWS Trainium 4%, Microsoft Maia 3%, and others 3%. Ironwood's launch is expected to grow Google's share to 10% by Q1 2027.

Article Summary

  • Google's TPU split is a strategic bet that inference, not training, will dominate the agentic era—but it concedes the training market to NVIDIA.
  • Ironwood's 4x per-watt inference improvement is unverified by third parties, making it a high-risk claim that will be tested by MLPerf benchmarks.
  • The biggest winners are agentic AI startups on Google Cloud; the biggest losers are AMD and any startup hoping for a unified hardware ecosystem.
  • Google's software stack (JAX, TensorFlow) remains a barrier for PyTorch-heavy agentic AI developers, potentially limiting adoption.
  • Amazon and Microsoft will respond with their own inference-optimized chips within 18 months, intensifying the custom silicon arms race.

Source and attribution

Hacker News
Our eighth generation TPUs: two chips for the agentic era

Discussion

Add a comment

0/5000
Loading comments...