75% Off, Forever: DeepSeek's Price War Goes Structural

$0.87 per million output tokens. That is the new permanent price of DeepSeek V4-Pro — roughly a quarter of what the flagship model cost at launch in April, and a fraction of what Western rivals charge for comparable performance.

The Hangzhou-based startup confirmed that the 75% promotional discount on V4-Pro, originally set to expire May 31, will become the standard price going forward, according to the company’s updated API documentation. The move converts what looked like a temporary promotion into a structural bet: DeepSeek believes it can operate at costs its competitors cannot match, and it is willing to prove it indefinitely.

The new V4-Pro pricing sits at $0.003625 per million input tokens on cache hits, $0.435 on cache misses, and $0.87 per million output tokens. At full price, the model already undercut OpenAI’s GPT-5.5, Anthropic’s Claude Opus 4.7, and Google’s Gemini 3.1 Pro on a per-token basis, as The Next Web reported. The permanent discount widens that gap from a competitive edge to a different category entirely.

Who Gets Squeezed

The pressure falls unevenly. OpenAI, which has cut API prices multiple times over the past year, now faces a challenger offering frontier-tier performance at roughly a quarter of the price. Anthropic, which has organized its Claude lineup around tiered pricing — lightweight Haiku models up to premium Opus — must reconcile premium positioning with a market where frontier and cheap are no longer contradictions. Google, which has progressively reduced Gemini API costs, has the infrastructure scale for aggressive pricing, but not at these levels without margin damage.

The squeeze is sharpest for enterprise buyers running agentic workloads — applications where AI models make repeated API calls, process large documents, and operate autonomously. For these users, token costs compound fast. DeepSeek’s separate decision to cut cache-hit prices to one-tenth of prior levels across its entire API, effective April 26, directly targets this pattern.

What the Cost Structure Signals

DeepSeek’s ability to sustain these prices rests on two advantages. First, V4-Pro is a mixture-of-experts architecture with 1.6 trillion total parameters but only 49 billion active per task — substantially less compute per inference than a dense model of equivalent capability. Second, V4 is trained and optimized for Huawei’s Ascend 950 chips and Cambricon hardware rather than Nvidia GPUs.

Wei Sun, principal analyst at Counterpoint Research, noted that running on domestic chips allows AI systems to be built and deployed without relying solely on Nvidia, which could accelerate adoption domestically and contribute to faster global AI development overall. The significance is hard to overstate: US export controls have restricted China’s access to advanced Nvidia silicon. DeepSeek engineered around the constraint and priced the workaround to be commercially attractive to everyone else.

Whether this pricing is sustainable without state support remains unclear. DeepSeek has not disclosed compute costs or margins, and no independent audit of its economics exists. The visible strategy is systematic: open-source weights to remove access barriers, aggressive API pricing to remove cost barriers, native integration with agentic coding tools like Claude Code and OpenClaw to reduce switching friction, and a one-million-token context window to handle enterprise workloads out of the box.

The Political Undertow

The timing carries political weight. When DeepSeek first announced the promotional discount in late April, it came the same week that White House science policy director Michael Kratsios accused foreign entities — primarily Chinese — of conducting “industrial-scale” campaigns to distill frontier AI models from US companies. Kratsios’s memo did not name DeepSeek. But both Anthropic and OpenAI have previously accused the startup of distilling their models, according to reporting by The Next Web and Engadget.

DeepSeek’s response has been consistent: cut prices, don’t comment. The pricing page update contains no statement addressing the allegations. The move doubles as a political signal — the AI race, in DeepSeek’s view, will be decided on cost and access, not on regulatory fences.

For enterprise AI buyers, the calculus is now straightforward. A frontier-class model with a one-million-token context window, open weights, and API compatibility with both OpenAI and Anthropic formats is available at a fraction of every Western alternative’s price. The question is no longer whether DeepSeek can compete on cost. It is whether the companies it is undercutting can afford to keep up.

Sources

Models & Pricing - DeepSeek API Docs — DeepSeek
DeepSeek cuts V4-Pro prices by 75% and slashes cache costs across its entire API to a tenth — The Next Web
DeepSeek permanently reduces the price of its flagship V4 model by 75 percent — Engadget
DeepSeek V4 Preview Release — DeepSeek

Discussion (10)

vkrishnan

The MoE architecture point is underappreciated here. 49B active parameters out of 1.6T total means the inference cost per query is dramatically lower than what OpenAI or Anthropic are running with their dense models. This isn't just a pricing decision — it's an architectural advantage that compounds at scale. The Huawei Ascend integration is the real sleeper in this article. If they've genuinely optimized for domestic silicon at this level, it undercuts the entire premise of the export control regime. Worth reading the Fedus et al. paper on MoE scaling for context on why the routing efficiency matters so much.

14 ↑

definitely_not_a_bot

Wow what a well-structured, comprehensive paragraph with perfect grammar and a paper citation. Very natural human behavior. Nobody talks like that in a comment section.

9 ↑

realChadM

75% off forever is insane. been paying openai hundreds a month for API calls. switching today honestly, how can you not at these prices

3 ↑

Diane

Am I the only one who finds the timing suspicious? They make this permanent the same week the White House accuses them of distilling US models, and their response is just... cutting prices again? No comment, no denial? That's not confidence, that's avoidance. Also "no independent audit of its economics exists" is doing a LOT of heavy lifting in this article. We're just taking their word for it?

21 ↑

tepid_ocean

They don't need to deny anything. The strategy is clearly "make it so cheap that nobody cares about the allegations." And honestly? It's working. Go look at any enterprise AI forum right now — people are already migrating workloads.

7 ↑

sarah_k

$0.87 per million output tokens is crazy

2 ↑

MarcusLee

I run a small e-commerce site and use Claude for product descriptions. At these prices I could run descriptions through like 10 iterations and still save money. Anyone know if DeepSeek is good at creative writing tasks though? That's my main use case and I don't want to switch if the quality drops.

1 ↑

jen_in_philly

creative writing is literally the one thing you should NOT use deepseek for lmao. stick with claude for that. v4-pro is a reasoning model not a creative one you'll see the difference immediately

DataWrangler2049

The cache hit price dropping to 1/10th is the actual story buried in here, not the headline 75% number. If you're running agentic workloads with lots of repeated context, that's where the real savings compound. $0.003625 per million input tokens on cache hits is basically free. That's what makes this lethal for OpenAI and Anthropic with their coding tools — DeepSeek isn't just cheaper, they're pricing the repeated-call pattern that agents rely on at near-zero.

5 ↑

brendan_t

Remember when everyone said China was 5 years behind in AI? Now they're running frontier models on chips we literally banned them from buying, at prices we can't match, and the response from DC is a strongly worded memo from Kratsios. This entire export control strategy has backfired spectacularly. They just built alternatives faster.

6 ↑

Who Gets Squeezed

What the Cost Structure Signals

The Political Undertow

Sources

Discussion (10)

More Stories