Everyone and their mother is buying Nvidia H100s and B200s like they’re going out of style. Google? They’ve been doing their own thing for years with custom Tensor Processing Units, and they’re not stopping now.
Last year we got the seventh-gen Ironwood TPU. This year, Google is skipping straight to the eighth generation with a twist: two distinct chips instead of one.
The TPU 8t is built for training. The TPU 8i is built for inference. Google is framing this split around the idea that we’re entering the “agentic era”—meaning AI systems that actually do things (book flights, run code, interact with APIs) rather than just generating text or images. Their argument is that agent workloads have fundamentally different hardware demands than the training-heavy, chat-heavy paradigm we’ve been stuck in.
Training a frontier model is still a beast. Google says the TPU 8t can cut training time from months down to weeks. That’s a big claim, and I’d love to see independent benchmarks, but Google has been iterating on TPUs long enough that I’m not dismissing it out of hand.
The TPU 8i is the more interesting chip to me. Inference is where the money actually flows once a model is deployed, and agent loops mean you’re running inference constantly—calling models, checking outputs, making decisions, calling again. That’s a very different compute profile than batch processing prompts. If the 8i is genuinely optimized for that kind of low-latency, high-throughput work, it could make Google Cloud a more compelling option for companies building real agent systems.
Of course, Nvidia isn’t sitting still. Their Grace Hopper and upcoming Blackwell architectures are also trying to handle inference more efficiently. But Google’s advantage is vertical integration: they control the chip, the networking, the software stack (JAX, TensorFlow), and the cloud platform. That gives them flexibility that Nvidia customers don’t have.
I’m curious how pricing shakes out. Google has historically priced TPUs aggressively to pull workloads onto their cloud. If these new chips are genuinely better for agent-style inference, they might actually win some converts who are tired of the Nvidia tax.
One thing I wish Google had clarified: how much of this is architectural versus just process node improvements? The press materials are light on transistor counts or die sizes. Given that they’re announcing two separate chips, I suspect there’s real architectural divergence here, not just binning.
We’ll see if the agent era lives up to the hype. But Google is making a clear bet that it will, and they’re building hardware to match. That’s more than most companies are doing.
Comments (0)
Login Log in to comment.
Be the first to comment!