DeepSeek V4 is here, and it’s actually interesting — here’s why

DeepSeek just released a preview of V4, its new flagship model, and it’s the kind of release that makes you sit up and pay attention — even if it won’t shake the industry the way R1 did back in January 2025.

If you remember, R1 was the reasoning model that turned DeepSeek from a quiet research outfit into China’s most recognizable AI company almost overnight. It was trained on limited hardware, performed shockingly well, and kicked off a wave of open-weight model releases from other Chinese firms. Since then, DeepSeek has been relatively quiet — until earlier this month when they added “expert” and “flash” modes to their online model, which was basically a teaser for V4.

V4 comes after months of scrutiny: personnel departures, delayed launches, and growing attention from both US and Chinese regulators. So this release is more than just a tech update — it’s a statement.

Let’s talk about what actually matters here.

It’s open source, and it’s cheap

DeepSeek claims V4’s performance rivals the best closed-source models at a fraction of the cost. That’s not just marketing fluff — the pricing is genuinely aggressive. V4-Pro runs $1.74 per million input tokens and $3.48 per million output tokens. V4-Flash is even cheaper: $0.14 per million input tokens and $0.28 per million output tokens. Compare that to what OpenAI or Anthropic charge, and it’s hard not to see the appeal for developers and startups.

On benchmarks, V4-Pro matches Anthropic’s Claude-Opus-4.6, OpenAI’s GPT-5.4, and Google’s Gemini-3.1. Against open-source rivals like Alibaba’s Qwen-3.5 or Z.ai’s GLM-5.1, it outperforms on coding, math, and STEM problems. DeepSeek also shared an internal survey of 85 experienced developers — over 90% ranked V4-Pro among their top models for coding tasks. That’s not nothing.

A million-token context window that actually works

Both V4 versions handle up to 1 million tokens. That’s a huge context window — think processing entire codebases or long documents in one go. DeepSeek says they achieved this through a new architecture that’s more memory-efficient than the standard transformer approach. Instead of loading everything into memory at once, V4 uses something called Multi-head Latent Attention (MLA) combined with a mixture-of-experts (MoE) setup. The result is that the model can handle long prompts without the usual memory blow-up.

This is genuinely useful for agentic tasks — things like code generation across multiple files, or complex problem-solving that requires keeping track of many steps. DeepSeek optimized V4 for popular agent frameworks like Claude Code, OpenClaw, and CodeBuddy. If you’re building AI agents, this is the kind of model you want to test.

The reasoning modes are a nice touch

Both V4-Pro and V4-Flash offer reasoning modes where the model shows its work step by step. It’s not a new idea — R1 had it — but it’s well executed here. For debugging or understanding how the model arrived at a conclusion, this is invaluable.

What’s the catch?

DeepSeek is still under a lot of scrutiny. The US and Chinese governments are both paying close attention, and there have been internal shakeups. The company also hasn’t shared full details on training data or compute — the technical report is thorough but leaves some questions unanswered. And while the benchmarks look great, real-world performance can vary. I’d like to see independent third-party evaluations before calling it a game-changer.

Still, for an open-source model that costs this little and handles this much context, V4 is a serious release. It won’t have the same shock value as R1, but it’s a solid step forward — and that’s more than most model releases can claim these days.

DeepSeek V4 is here, and it’s actually interesting — here’s why

Comments (0)