Fifteen months ago, on a long January weekend in 2025, a Hangzhou hedge-fund spinout shipped a reasoning model called R1 and detonated something close to a trillion dollars off the U.S. tech complex in a single trading session. Nvidia alone lost about $600 billion in market cap. Pundits called it a Sputnik moment. Then everyone moved on.
On April 24, 2026, the same lab, DeepSeek, finally released the long-awaited successor. It dropped without a glossy keynote, without a livestream, without even a press release that you could call a press release. Just a model card, a paper, and an updated price page. If R1 was the bang, V4 is the part where you realize the explosion left a crater you have to walk around now.
January 2025
R1 ships. ~$1T wiped from U.S. tech. Nvidia loses $600B in market cap in a single session.
April 24, 2026
V4 drops quietly — no keynote, no livestream. Just a model card, a paper, and a price page.
The Difference
R1 was the bang. V4 is the crater you now have to walk around.
What V4 Actually Is
V4 ships in two open-weight Mixture-of-Experts (MoE) variants. V4-Pro is 1.6 trillion total parameters with 49 billion active per token, 61 layers, 384 routed experts, trained on 33 trillion tokens. V4-Flash is 284B total, 13B active. Both have a native 1-million-token context window, both expose the now-standard Thinking and Non-Thinking modes, and both are released under permissive weights on Hugging Face.
At 1.6T parameters, V4-Pro is the largest open-weight model anyone has shipped — bigger than Moonshot's Kimi K2.6 (≈1.1T), more than double DeepSeek's own V3.2 (671B), and three and a half times the size of MiniMax M1.
But the brag isn't the parameter count. It's what DeepSeek did to make a 1.6T model cheap to serve. The architecture stacks Compressed Sparse Attention on top of Heavily Compressed Attention, a hybrid attention scheme, and quantizes MoE expert weights down to FP4. The combined effect, at a 1M-token context, is to drop V4-Pro's inference FLOPs to 27% of V3.2's and KV cache to 10%. V4-Flash is even more aggressive: 10% and 7%.
Architecture at a Glance
V4-Pro
1.6T total params · 49B active · 61 layers · 384 experts · 33T training tokens
V4-Flash
284B total · 13B active · 1M context · Thinking & Non-Thinking modes
That is the whole story of frontier AI compressed into one chart. The model gets bigger; the bill gets smaller.
Where It Wins, and Where It Doesn't
DeepSeek's own marketing line is that V4-Pro "beats all rival open models for maths and coding," and the numbers back that up, barely, but they back it up. V4-Pro-Max posts a Codeforces rating of 3,206, which would place it 23rd among living human competitors and makes it the first open model to land cleanly in the closed-frontier zone on competitive programming.
LiveCodeBench
HMMT 2026 February
Now the bad news, which DeepSeek itself volunteers in the technical report: V4 still gets beaten on world knowledge (MMLU-Pro 87.5 vs Gemini's 91.0), on factual retrieval (SimpleQA-Verified 57.9 vs 75.6), on long-horizon agentic coding (Terminal Bench 2.0 67.9 vs GPT-5.4's 75.1), and on basically anything that isn't text. The lab frames its own position as "approximately 3 to 6 months" behind state-of-the-art.
V4 is also text-only. No audio, no video, no native images. In a year when GPT-5.5 and Gemini 3.1 are eating the multimodal table whole, that is a real omission, not a footnote.
The frontier is still in front. But for the first time, it's Chinese open weights, not California closed weights, that are pulling the average up.
The Price Weapon
If you can't be best, be cheapest by an order of magnitude.
$0.14
V4-Flash Input
Per million tokens — the most aggressive entry-level pricing in the open-weight market.
$0.435
V4-Pro (Promo)
75%-off launch rate through May 31, 2026. List price is $1.74/M input.
$5
GPT-5.5 Input
Closed frontier pricing — roughly 11× V4-Pro at promo rates.
30×
Output Gap
At promo rates, the output-token price gap between V4-Pro and GPT-5.5 compounds to ~30× across millions of tool calls.
The pricing matters because it forces a question every CTO will now have to answer in writing: what exactly does the closed-source premium buy me? If your application is code generation, math-heavy reasoning, or agent loops where an order-of-magnitude output-token gap compounds across millions of tool calls — and during the promo, a roughly 30× one — the answer is starting to be "not enough."
That isn't undercutting. That's a different sport.
The Pile Is Crowded
V4 doesn't arrive into a vacuum. The open-weight roster has tightened sharply.
Kimi K2.6 — Moonshot
Remains the leader on long-horizon agents. ≈1.1T parameters.
GLM-5.1 — Zhipu
Still owns agentic web dev benchmarks.
Qwen 3.5/3.6 — Alibaba
Keeps shipping at a brutal cadence.
MiniMax M2.7
In the mix and competitive across multiple benchmarks.
MiMo V2 — Xiaomi
Just appeared. The field keeps expanding.
On BenchLM's open-source leaderboard, V4-Pro-Max edges Kimi 87 to 86 — a margin so thin it will probably flip again before summer. What's striking, looking at the chart above, is that nine of the ten largest open-weight models in the world are now Chinese. Llama is conspicuously absent. So is Mistral at scale. The open-source center of gravity has finished its migration east, and V4 just put a flag on top of the hill.
The Real Story Is the Chips
From Nvidia to Huawei
R1 was trained on Nvidia H800s — the export-controlled, deliberately neutered chips the U.S. allowed to ship into China before the rules tightened again.
V4 is not.
According to reporting from CNN and Bloomberg, V4 was trained and is being served on a Huawei "Supernode" cluster built around Ascend 950 accelerators, with additional support from Cambricon.
Why This Keeps Hawks Up at Night
As Wei Sun of Counterpoint Research put it, this "allows AI systems to be built and deployed without relying solely on Nvidia, which is why V4 could ultimately have an even bigger impact than R1."
The export-control thesis was always that you could slow Chinese AI by squeezing the supply of high-end accelerators. V4 is the first frontier-adjacent model that says, plainly:
The squeeze didn't work, and we have our own stack now.
A 1.6T-parameter MoE that trains and serves entirely on Chinese silicon is a structural fact — and it will outlast whichever lab is on top of LiveCodeBench this quarter.
Two Takeaways
Takeaway One
Open-weight is no longer the discount tier. It is a parallel frontier — three to six months behind on knowledge and multimodality, at parity or ahead on code and math, and an order of magnitude cheaper at the API. If you're still defaulting to a closed model for everything, you are paying a tax you may not need to pay.
Takeaway Two
The hardware story is the only story now. Benchmarks are noisy; supply chains are not. R1 was the moment the West noticed Chinese AI. V4 is the moment Chinese AI stopped needing the West to ship it.