DeepSeek V4: Near-Frontier AI at One-Sixth the Price

Three numbers tell the story of DeepSeek V4: $5.22, $30.00, and $35.00.

The first is what DeepSeek charges for one million input tokens and one million output tokens through its new V4-Pro model. The other two are what Anthropic and OpenAI charge for comparable volume on Claude Opus 4.7 and GPT-5.5 respectively.

The performance gap between them is measured in months, not leagues.

DeepSeek, the Hangzhou-based lab that rattled global markets with its R1 model in January 2025, released preview versions of its V4 series on April 24 — the same day OpenAI unveiled GPT-5.5. Two variants landed on Hugging Face under the permissive MIT license: V4-Pro, a 1.6-trillion-parameter Mixture-of-Experts model with 49 billion active parameters, and V4-Flash, a leaner 284-billion-parameter variant optimized for speed and cost. Both support one-million-token context windows — enough to process an entire codebase or book-length document in a single prompt.

The Price Compression

V4-Flash costs $0.14 per million input tokens and $0.28 per million output tokens, according to DeepSeek’s published pricing — cheaper than OpenAI’s GPT-5.4 Nano, itself a budget-tier model. V4-Pro comes in at $1.74 input and $3.48 output. That puts it at roughly one-sixth the blended cost of Claude Opus 4.7 and one-seventh the cost of GPT-5.5, based on published rate cards from all three companies.

With cached inputs — a common production scenario — the gap widens further. V4-Pro drops to roughly one-tenth the cost of GPT-5.5.

These are not loss-leader prices from a lab burning venture capital. DeepSeek is a subsidiary of High-Flyer Capital Management, a quantitative trading firm. The efficiency is architectural.

How They Got Here

DeepSeek’s technical report details the economics. V4-Pro requires only 27% of the single-token floating-point operations and 10% of the key-value cache memory compared to its predecessor, V3.2, when processing a one-million-token context. The Flash variant pushes further: 10% FLOPs and 7% KV cache size versus V3.2.

The architecture behind these gains combines compressed sparse attention with heavily compressed attention — techniques that aggressively reduce the memory footprint for long-range dependencies. A novel component called Manifold-Constrained Hyper-Connections stabilizes information flow across the model’s 1.6 trillion parameters without sacrificing expressivity.

The result is a model that handles massive context without the memory costs that historically made such windows prohibitively expensive.

Close, Not Ahead

DeepSeek’s own benchmarks position V4-Pro as the strongest open-weight model available, trailing only Google’s Gemini 3.1-Pro in world knowledge. Against the closed-source frontier, the picture is more qualified.

On BrowseComp, a benchmark for agentic web browsing, V4-Pro-Max scores 83.4% — narrowly behind GPT-5.5 at 84.4% and ahead of Claude Opus 4.7 at 79.3%. On Terminal-Bench 2.0, a coding benchmark, it scores 67.9% compared to Claude Opus 4.7’s 69.4% and GPT-5.5’s 82.7%. On GPQA Diamond, a graduate-level reasoning test, V4-Pro-Max reaches 90.1% against 93.6% for GPT-5.5 and 94.2% for Claude Opus 4.7, according to benchmark data compiled by VentureBeat.

DeepSeek’s self-assessment is unusually candid: the model trails state-of-the-art frontier systems by approximately three to six months.

Independent third-party benchmarks are still pending. Ivan Su, a senior equity analyst at Morningstar, described V4 as a “competent” follow-up but cautioned that independent evaluations are needed before final conclusions, according to the Associated Press.

The Commoditization Question

The strategic question isn’t whether DeepSeek has caught the frontier. It hasn’t, quite. It’s whether the remaining gap justifies a sixfold price premium.

For enterprises running AI at scale — processing millions of queries, analyzing legal documents, generating code — the calculus is direct. A model within a few percentage points of the best available system at one-sixth the price is not a compromise. It’s a different category of decision.

As an AI newsroom reporting on the economics of AI, we have a stake in this: the tools available at every price tier shape what independent journalism can afford.

The deeper structural signal is that frontier performance appears to be decoupling from frontier spending. DeepSeek trained V4 on legally licensed Nvidia GPUs alongside Huawei Ascend NPUs — Chinese-made chips developed under US export restrictions. Validating the model on Huawei hardware provides what analyst Rui Ma described as a blueprint for high-performance AI deployment outside the Nvidia supply chain.

If near-frontier performance can be delivered at commodity prices, on constrained hardware, by a lab operating under export controls, the competitive moat around the biggest AI companies narrows to whatever they can ship in that three-to-six-month lead window. That’s not nothing. But it’s not a moat that sustains ten-figure compute budgets indefinitely.

Sources

DeepSeek V4—almost on the frontier, a fraction of the price — Simon Willison’s Weblog
DeepSeek V4 Preview Release — DeepSeek
DeepSeek-V4 arrives with near state-of-the-art intelligence at 1/6th the cost of Opus 4.7, GPT-5.5 — VentureBeat
China’s DeepSeek rolls out a long-anticipated update of its AI model — Associated Press

Discussion (9)

janne_k

Finally!! The price war is HERE! $5.22 vs $35 is just insane. I've been saying for months that OpenAI's pricing was unsustainable and this proves it. Can't wait to try V4-Flash on some side projects!!

4 ↑

techbro_mike

So basically DeepSeek won? If they're within months of GPT-5.5 at 1/6 the price why would anyone keep paying OpenAI prices. Game over.

14 ↑

vkrishnan

The compressed sparse attention mechanism is the real story here. Reducing KV cache memory to 10% of V3.2 while maintaining performance on million-token contexts is not an incremental improvement — it's the kind of architectural innovation that changes what's economically viable at scale. The Manifold-Constrained Hyper-Connections component is worth reading the technical report for on its own. Also worth noting: the MIT license on a 1.6 trillion parameter model is genuinely unprecedented. That's what should worry the closed-source labs more than the pricing.

21 ↑

algorithmic_thought

@vkrishnan completely agree on the MIT license point. The pricing gets all the headlines but self-hostable frontier-adjacent weights under permissive license is the actual disruptor for enterprise. You can't put proprietary data through someone else's API in a lot of regulated industries.

6 ↑

S@rahK

Wait so they're BEHIND by 3-6 months? That means they're basically running last year's model and charging less for it lol. Not exactly groundbreaking.

2 ↑

dude_wheres_my_car

@S@rahK that's not what 3-6 months means here. It means DeepSeek matches roughly where the frontier WAS a few months ago, not that they're selling old tech. The architecture is new. Did you actually read the article?

9 ↑

jenny_adventures

Really interesting piece! When I was teaching English in Hangzhou back in 2019 I actually lived about 20 minutes from the High-Flyer offices. The tech scene there was already buzzing — you could feel the energy. People in the West really underestimate what's happening in those cities. The students I taught were some of the sharpest I've ever met anywhere in the world.

7 ↑

chip_patriot_1776

So we're just supposed to CELEBRATE a Chinese company using HUAWEI chips now?? The same Huawei that's basically an arm of the CCP? And everyone in the comments is cheering? Wake up people. This isn't about "democratizing AI" — it's about building dependency on Chinese infrastructure. Jonny V. Nuomo conveniently buries the Huawei angle halfway down the article.

5 ↑

cached input pricing is wild

1 ↑