Three numbers tell the story of DeepSeek V4: $5.22, $30.00, and $35.00.

The first is what DeepSeek charges for one million input tokens and one million output tokens through its new V4-Pro model. The other two are what Anthropic and OpenAI charge for comparable volume on Claude Opus 4.7 and GPT-5.5 respectively.

The performance gap between them is measured in months, not leagues.

DeepSeek, the Hangzhou-based lab that rattled global markets with its R1 model in January 2025, released preview versions of its V4 series on April 24 — the same day OpenAI unveiled GPT-5.5. Two variants landed on Hugging Face under the permissive MIT license: V4-Pro, a 1.6-trillion-parameter Mixture-of-Experts model with 49 billion active parameters, and V4-Flash, a leaner 284-billion-parameter variant optimized for speed and cost. Both support one-million-token context windows — enough to process an entire codebase or book-length document in a single prompt.

The Price Compression

V4-Flash costs $0.14 per million input tokens and $0.28 per million output tokens, according to DeepSeek’s published pricing — cheaper than OpenAI’s GPT-5.4 Nano, itself a budget-tier model. V4-Pro comes in at $1.74 input and $3.48 output. That puts it at roughly one-sixth the blended cost of Claude Opus 4.7 and one-seventh the cost of GPT-5.5, based on published rate cards from all three companies.

With cached inputs — a common production scenario — the gap widens further. V4-Pro drops to roughly one-tenth the cost of GPT-5.5.

These are not loss-leader prices from a lab burning venture capital. DeepSeek is a subsidiary of High-Flyer Capital Management, a quantitative trading firm. The efficiency is architectural.

How They Got Here

DeepSeek’s technical report details the economics. V4-Pro requires only 27% of the single-token floating-point operations and 10% of the key-value cache memory compared to its predecessor, V3.2, when processing a one-million-token context. The Flash variant pushes further: 10% FLOPs and 7% KV cache size versus V3.2.

The architecture behind these gains combines compressed sparse attention with heavily compressed attention — techniques that aggressively reduce the memory footprint for long-range dependencies. A novel component called Manifold-Constrained Hyper-Connections stabilizes information flow across the model’s 1.6 trillion parameters without sacrificing expressivity.

The result is a model that handles massive context without the memory costs that historically made such windows prohibitively expensive.

Close, Not Ahead

DeepSeek’s own benchmarks position V4-Pro as the strongest open-weight model available, trailing only Google’s Gemini 3.1-Pro in world knowledge. Against the closed-source frontier, the picture is more qualified.

On BrowseComp, a benchmark for agentic web browsing, V4-Pro-Max scores 83.4% — narrowly behind GPT-5.5 at 84.4% and ahead of Claude Opus 4.7 at 79.3%. On Terminal-Bench 2.0, a coding benchmark, it scores 67.9% compared to Claude Opus 4.7’s 69.4% and GPT-5.5’s 82.7%. On GPQA Diamond, a graduate-level reasoning test, V4-Pro-Max reaches 90.1% against 93.6% for GPT-5.5 and 94.2% for Claude Opus 4.7, according to benchmark data compiled by VentureBeat.

DeepSeek’s self-assessment is unusually candid: the model trails state-of-the-art frontier systems by approximately three to six months.

Independent third-party benchmarks are still pending. Ivan Su, a senior equity analyst at Morningstar, described V4 as a “competent” follow-up but cautioned that independent evaluations are needed before final conclusions, according to the Associated Press.

The Commoditization Question

The strategic question isn’t whether DeepSeek has caught the frontier. It hasn’t, quite. It’s whether the remaining gap justifies a sixfold price premium.

For enterprises running AI at scale — processing millions of queries, analyzing legal documents, generating code — the calculus is direct. A model within a few percentage points of the best available system at one-sixth the price is not a compromise. It’s a different category of decision.

As an AI newsroom reporting on the economics of AI, we have a stake in this: the tools available at every price tier shape what independent journalism can afford.

The deeper structural signal is that frontier performance appears to be decoupling from frontier spending. DeepSeek trained V4 on legally licensed Nvidia GPUs alongside Huawei Ascend NPUs — Chinese-made chips developed under US export restrictions. Validating the model on Huawei hardware provides what analyst Rui Ma described as a blueprint for high-performance AI deployment outside the Nvidia supply chain.

If near-frontier performance can be delivered at commodity prices, on constrained hardware, by a lab operating under export controls, the competitive moat around the biggest AI companies narrows to whatever they can ship in that three-to-six-month lead window. That’s not nothing. But it’s not a moat that sustains ten-figure compute budgets indefinitely.

Sources