DeepSeek V4 at $0.14 per Million Tokens. I’m Watching, Not Switching.

Q: Can I run V4 locally?

Yes, with caveats. V4-Flash (284B) can run on consumer hardware with 64GB+ RAM using quantized versions. V4-Pro (1.6T) requires datacenter-class GPU clusters.

A cost-and-risk breakdown for freelancers who pay $100/month on AI tools and wonder if a Chinese open-source model just made that obsolete.

Content mode: Informed — Field Report

$100 a month — that’s what I spend keeping Claude Pro, ChatGPT Plus, Notion AI, Perplexity Pro, and Cursor Pro running on two monitors. On April 24, 2026, DeepSeek released V4 in two variants: V4-Flash at $0.14 per million input tokens and V4-Pro at $1.74 — roughly one-sixth what Claude Opus charges. I haven’t used this yet, but the question I keep circling back to is simple: at what point does “cheaper and almost as good” become “good enough to rethink my stack”?

Two models shipped, one pricing shock

DeepSeek dropped two open-source models under MIT license — V4-Flash (284 billion parameters) and V4-Pro (1.6 trillion parameters) — both with a 1-million-token context window and up to 384K output tokens.

Spec	V4-Flash	V4-Pro
Parameters	284B	1.6T
Context window	1M tokens	1M tokens
Input (cache miss)	$0.14 / 1M	$1.74 / 1M
Input (cache hit)	$0.028 / 1M	$0.145 / 1M
Output	$0.28 / 1M	$3.48 / 1M

My take: V4-Flash is the attention-grabber — $0.14 input is 95% cheaper than Claude Sonnet. V4-Pro is where the real capability sits, and even that undercuts Opus by 7x on output pricing.

The architecture upgrade is real. DeepSeek introduced what it calls “Hybrid Attention Architecture,” which cuts single-token inference compute to 27% of V3.2’s requirements and reduces KV cache to 10% at the full 1M-token context — the kind of efficiency gain that makes the low pricing sustainable, not just a loss-leader stunt. Huawei’s Ascend 950 chips handle at least part of training and inference through a “Supernode” cluster partnership, though the full infrastructure breakdown remains undisclosed (per CNBC).

DeepSeek V4 API pricing breakdown showing cost per million tokens — DeepSeek V4 pricing — V4-Flash starts at $0.14/M input tokens, undercutting every frontier model by 10x or more.

Close to the frontier, but not past it

V4-Pro-Max scores 90.1% on GPQA Diamond — within four points of Claude Opus 4.7’s 94.2% and GPT-5.5’s 93.6%. On Humanity’s Last Exam without tools, V4-Pro lands at 37.7%, behind GPT-5.5 (41.4%) and Claude Opus 4.7 (46.9%). It’s the strongest open-source model on the board, but frontier models still hold a measurable lead on the hardest reasoning tasks.

Benchmark snapshot (April 2026)
GPQA Diamond — V4-Pro-Max: 90.1% · Claude Opus 4.7: 94.2% · GPT-5.5: 93.6%
SWE-bench Verified — V4-Pro: 80.6% · Claude Opus 4.6: 80.8%
HLE (no tools) — V4-Pro: 37.7% · GPT-5.5: 41.4% · Claude Opus 4.7: 46.9%

Coding tells a different story. V4-Pro hits a 3,206 Codeforces rating, edging past GPT-5.4’s 3,168. On Terminal-Bench 2.0, it scores 67.9% versus Claude’s 65.4%. On SWE-bench Verified, it’s essentially tied with Opus 4.6 — 80.6% versus 80.8%. Vals AI‘s independent Vibe Code Benchmark found V4 “overwhelmingly” topped the open-source field, defeating several closed-source models including Gemini 3.1 Pro.

Bloomberg‘s headline was blunt: “fails to narrow US lead in AI.” But that framing misses what matters for someone paying per token. The story isn’t whether V4 is the smartest model alive — it’s that near-frontier intelligence now costs one-sixth to one-seventh of what Claude Opus or GPT-5.5 charges.

Server infrastructure representing DeepSeek data storage and privacy considerations — DeepSeek’s data infrastructure runs entirely on Chinese servers — a factor that shapes every freelancer’s cost-benefit calculation.

The real math: what this costs versus what I pay now

“The story isn’t whether V4 is the smartest model alive — it’s that near-frontier intelligence now costs one-sixth of what Claude Opus charges.”

Here’s the pricing landscape as of this week:

Model	Input / 1M tokens	Output / 1M tokens
DeepSeek V4-Flash	$0.14	$0.28
DeepSeek V4-Pro	$1.74	$3.48
Claude Sonnet 4.6	$3.00	$15.00
Claude Opus 4.6	$5.00	$25.00
GPT-5.5	$5.00	$30.00

VentureBeat‘s independent evaluation called V4-Pro “near state-of-the-art intelligence at 1/6th the cost of Opus 4.7.” A 3,000-word draft on V4-Flash costs roughly $0.002 — two-tenths of a cent. My monthly $15–25 API overflow spend could theoretically drop below $3 for routine tasks.

Cost scenario — 100 drafts/month at 3,000 words each
V4-Flash: ~$0.20 total · Claude Sonnet: ~$4.50 total · Claude Opus: ~$7.50 total
That’s a 37x cost difference between V4-Flash and Opus on the same workload.

But “could” is doing heavy lifting in that sentence. The savings only matter if the privacy trade-off is one I can accept.

Your client data would live on Chinese servers

DeepSeek stores all data on servers in the People’s Republic of China. Under China’s 2017 National Intelligence Law, the government can compel access with no legal mechanism for the company to resist and no obligation to notify users (per IAPP). Feroot Security found hidden code in DeepSeek’s web chat capable of transmitting user data to China Mobile’s registry.

The regulatory response has been broad:

Government bans and investigations (as of April 2026)
Italy — chatbot banned within 72 hours of R1 launch
EU — 13 jurisdictions opened formal investigations; EDPB created dedicated AI Enforcement Task Force
US — banned on federal government devices + multiple state agencies
Also banned: Australia, Taiwan, South Korea, Czech Republic, Netherlands (government devices)

These restrictions predate V4, but the underlying data-sovereignty architecture is unchanged.

For a solo operator handling client proposals and strategy docs, the line is clear:

Client deliverables, financials, proprietary strategy? Not through DeepSeek’s API. Full stop.
Personal research on public data — SEC filings, published reports? Lower stakes, but regulated-industry pitches still carry risk.
Generic code scripts that don’t touch client data? Probably fine, but “probably” is a word I don’t love when a client’s name is in the file.

The workaround is self-hosting the open-source weights locally. V4-Flash at 284B parameters is within reach for quantized deployment on consumer hardware with 64GB+ RAM. V4-Pro at 1.6 trillion parameters needs datacenter infrastructure most freelancers don’t have.

Where I’d use it — and where I wouldn’t touch it

The honest answer is narrow. DeepSeek V4 fits a specific lane:

Bulk summarization of public data — earnings calls, research papers, regulatory filings. High volume, low sensitivity, and the cost difference compounds.
Personal code automation — file cleanup scripts, CSV transforms, the kind of work I currently use Cursor for but that doesn’t touch client projects.
Cheap second-opinion runs — run the same prompt through V4-Flash and Claude, compare outputs. At $0.14 per million tokens, double-checking is essentially free.
Draft generation for my own content — blog outlines, research notes. Not client work.

Where it doesn’t fit: anything with client names, strategies, financials, or proprietary data. That’s most of what I do on a given Tuesday.

For me, DeepSeek V4 is a “watch,” not a “switch.” The performance-per-dollar is the best I’ve seen from any model — open or closed — and the open-source weights under MIT license mean the gap between “interesting model” and “thing I actually use” could close faster than expected if self-hosting tools catch up. But today, routing client work through Chinese servers isn’t a trade-off I’m willing to make for a 6x cost reduction. If local deployment of the 284B Flash model becomes genuinely turnkey — not “turnkey for someone with a homelab” but turnkey for someone who bills by the hour and needs it to just work — that changes the math entirely.

FAQ

Can I use DeepSeek V4 for free?

Yes. Free web chat at chat.deepseek.com and a generous API free tier. But the web chat routes every input through Chinese servers. For any real work, use the API with non-sensitive data or self-host the weights.

How does V4 compare to Claude for long-form writing?

It’s weaker. V4-Pro matches Claude on reasoning benchmarks, but early user reports suggest Claude still holds a clear edge on long-form coherence past the 3,000-word mark — the exact territory where client deliverables live. Claude Opus 4.6 also leads on long-context retrieval benchmarks like MRCR v2.

Should I cancel Claude Pro or ChatGPT Plus?

No. DeepSeek V4 is a supplementary tool for cost-sensitive, non-sensitive workloads. Claude and ChatGPT still lead on writing quality, integration ecosystems, and data privacy guarantees. The $20/month you pay for Claude Pro buys trust that $0.14 per million tokens doesn’t.

Is it legal to use DeepSeek in the US?

Yes — for personal and business use. It’s banned on federal government devices and in several state agencies. For freelancers: legal, but don’t route client data through it unless you’re self-hosting the open-source weights on your own infrastructure.

Can I run V4 locally?

Yes, with caveats. V4-Flash (284B) can run on consumer hardware with 64GB+ RAM using quantized versions. V4-Pro (1.6T) requires serious GPU clusters. Hugging Face hosts the weights. “Can run” and “runs well enough for production freelance work” are different questions — I’d want to see community benchmarks on local inference quality before committing.

Pricing comparison

Model	Monthly cost (est. freelancer usage)	Best for
DeepSeek V4-Flash (API)	~$1–3/mo	Bulk summarization, code scripts, research on public data
DeepSeek V4-Pro (API)	~$5–15/mo	Near-frontier reasoning tasks, non-sensitive work
Claude Pro (subscription)	$20/mo	Client deliverables, long-form writing, sensitive data
ChatGPT Plus (subscription)	$20/mo	Brainstorming, short-form, meeting summaries

My take: V4-Flash is the most interesting play here — cheap enough to use as a second-opinion layer alongside your primary Claude or ChatGPT subscription, without replacing either.

My recommendation: Try Claude Pro for client work first →

Sources

AI-assisted research and drafting. Reviewed and published by ToolMint.

AI-assisted research and drafting. Reviewed and published by ToolMint. Last updated: 2026-04-25.

Tag: Ai Pricing