Google’s New AI Agents Want to Fix Your Bad Figures and Broken Peer Review

Google Research just dropped two new AI agents aimed at the academic grind: PaperVizAgent (formerly PaperBanana) for drawing figures, and ScholarPeer for automated peer review. Having spent years wrestling with matplotlib and dealing with reviewers who clearly skimmed my abstract, I’ve got opinions.

Let’s start with the problem. Academic research is exploding. More papers, more submissions, more pressure. The peer review system is stretched thin — reviewers are burned out, evaluations are inconsistent, and the whole thing feels like a lottery sometimes. Meanwhile, creating good figures for papers is a genuine skill that many researchers (myself included) never fully master. You can have the best idea in the world, but if your methodology diagram looks like a kindergarten art project, reviewers will notice.

PaperVizAgent tackles the figure problem head-on. You feed it two things: your manuscript’s method section (source context) and a detailed figure caption (communicative intent). Then a team of five specialized AI agents goes to work — a retriever, a planner, a stylist, a visualizer, and a critic. The retriever grabs relevant reference figures from existing literature. The planner organizes the content. The stylist sets aesthetic guidelines. The visualizer renders the image or generates executable Python code for statistical plots. And the critic checks the output against the original text, looping back for refinement if something’s off.

This iterative refinement is the key. I’ve seen too many AI-generated figures that look pretty but are technically wrong — wrong axis labels, mismatched data, conceptual errors. The critic agent is supposed to catch that. In their evaluations, Google claims PaperVizAgent consistently beats GPT-Image-1.5, Nano-Banana-Pro, and Paper2Any. That’s a strong claim, but I’d want to see independent benchmarks before I fully trust it.

Now, ScholarPeer is the more ambitious one. It’s a reviewer agent that evaluates papers automatically, including inlined diagrams. The goal is to deliver highly critical, literature-grounded reviews that beat state-of-the-art automated reviewers. That’s a tall order. Peer review isn’t just about checking for technical errors — it’s about judging novelty, significance, and fit. Can an AI really do that?

Google’s approach here is to ground the reviews in existing literature, which helps with factual accuracy. But I’m skeptical about novelty judgment. AI models are trained on past papers, so they tend to favor incremental improvements over truly novel ideas. There’s a real risk of reinforcing the status quo. Still, for catching obvious flaws — missing citations, inconsistent data, poorly explained methods — ScholarPeer could be genuinely useful.

The bigger question is whether researchers will actually use these tools. Academics are a conservative bunch. We’ve all seen the horror stories of AI-generated figures with hallucinated data or AI-written reviews that are obviously nonsense. Trust is hard to earn. Google needs to be transparent about failure modes and limitations.

One thing I appreciate: they’re not claiming these agents replace humans. The language is careful — “assist,” “empower,” “streamline.” PaperVizAgent is for generating drafts that humans refine. ScholarPeer is for catching low-hanging fruit before submission. That’s the right framing.

I also like that PaperVizAgent generates executable Python code for statistical plots. That’s smart. It means you can verify the output programmatically, not just trust the rendered image. For methodology diagrams, the agent produces vector graphics that are actually editable. These are small details that show Google understands how academics actually work.

Will this catch on? Hard to say. The code is open-source, which helps. But adoption depends on how well it integrates into existing workflows. If I have to jump through hoops to get my LaTeX document talking to PaperVizAgent, I’ll probably stick with my messy TikZ code. If it’s a simple API call or a plugin for Overleaf, that changes things.

For ScholarPeer, the bar is even higher. Automated review tools have been tried before — ICLR used a rudimentary system for a while, and it was… not great. The reviews were generic and often missed the point. Google claims ScholarPeer is more rigorous, but I’ll believe it when I see it in action.

Overall, this is a solid step forward. The academic workflow is ripe for automation, but it’s also fragile. One bad experience with a hallucinated figure or a nonsensical review can sour the whole community. Google seems aware of this, but the proof will be in the real-world usage. I’ll be watching closely.

Google’s New AI Agents Want to Fix Your Bad Figures and Broken Peer Review

Comments (0)