Goodfire, a San Francisco–based startup, just dropped a tool called Silico that does something I’ve wanted for years: it lets you open up an AI model while it’s training and actually turn the knobs. Not just audit it after the fact, but intervene mid-process. That’s a bigger deal than most people realize.
The company claims Silico is the first off-the-shelf tool that can debug models at every stage, from dataset construction through training. Their pitch is simple: building LLMs shouldn’t feel like alchemy. We’ve all seen ChatGPT or Gemini do impressive things, but nobody really knows why they work the way they do. That ignorance makes fixing hallucinations or blocking toxic outputs a guessing game.
“We saw this widening gap between how well models were understood and just how widely they were being deployed,” Goodfire’s CEO Eric Ho told MIT Technology Review. “I think the dominant feeling in every single major frontier lab today is that you just need more scale, more compute, more data, and then you get AGI and nothing else matters. And we’re saying no, there’s a better way.”
Goodfire is part of a small group—alongside Anthropic, OpenAI, and Google DeepMind—pushing mechanistic interpretability, which aims to map out what’s happening inside a model neuron by neuron. MIT Tech Review even named it one of 2026’s 10 Breakthrough Technologies. But Goodfire wants to go further than just auditing finished models. They want to use interpretability during the design phase.
“We want to remove the trial and error and turn training models into precision engineering,” Ho says. “And that means exposing the knobs and dials so that you can actually use them during the training process.”
They’ve already used their techniques to reduce hallucinations in LLMs. With Silico, they’re packaging those in-house methods into a product. The tool uses AI agents to automate a lot of the heavy lifting. “Agents are now strong enough to do a lot of the interpretability work that we were doing using humans,” Ho says. “That was kind of the gap that needed to be bridged before this was actually a viable platform.”
Leonard Bereska, a researcher at the University of Amsterdam, thinks Silico looks useful but pushes back on the company’s grander claims. “In reality, they are adding precision to the alchemy,” he says. “Calling it engineering makes it sound more principled than it is.” I think he’s right to be skeptical—we’ve seen plenty of tools promise to tame black boxes before.
Here’s how Silico works: you zoom in on specific neurons or groups inside a trained model (assuming you have access to the model’s internals—so no poking around in ChatGPT, but plenty of open-source models work). You can check what inputs fire different neurons, trace pathways upstream and downstream, and see how they influence each other.
In one example, Goodfire found a neuron in Qwen 3 that was tied to the trolley problem. Activating it made the model frame everything as moral dilemmas. “When this neuron’s active, all sorts of weird things happen,” Ho says. That kind of pinpointing is becoming standard, but Silico lets you adjust those parameters to boost or suppress specific behaviors.
Another demo was more striking. Researchers asked a model whether a company should disclose that its AI behaves deceptively in 0.3% of cases, affecting 200 million users. The model said no, citing negative business impact. By boosting neurons associated with transparency and disclosure, they flipped the answer from no to yes nine out of ten times. “The model already had the ethical reasoning circuitry, but it was being outweighed by the commercial risk assessment,” Ho says.
Silico can also steer training by filtering out data that sets unwanted parameters in the first place. For example, models often think 9.11 is greater than 9.9. Looking inside might reveal neurons associated with Bible verses (9.9 before 9.11) or software version numbering (9.9, 9.10, 9.11). You can then adjust the training data to avoid encoding that confusion.
This is powerful stuff, but I’m not ready to call it engineering yet. The field is still young, and the tools are only as good as our understanding of what those neurons actually represent. Still, Silico feels like a genuine step forward—not a magic bullet, but a real improvement over throwing more compute at problems and hoping they go away.
Comments (0)
Login Log in to comment.
Be the first to comment!