The AI industry spent years in an arms race to build the biggest models money could buy. Now a Spanish startup is making a business out of shrinking them back down.

Multiverse Computing has launched a self-serve API portal and a consumer chat app, both powered by compressed versions of models from OpenAI, Meta, DeepSeek, and Mistral AI. The pitch is simple: near-frontier performance at a fraction of the compute cost. The company claims its CompactifAI technology — inspired by quantum computing principles — can reduce model size by up to 95% while losing only 2–3% accuracy.

The flagship product is HyperNova 60B, a compressed version of OpenAI’s gpt-oss-120B. It weighs 32GB, down from 61GB, and ships free on Hugging Face. Benchmarks show 5x improved agentic tool use and 2x improved coding performance over its previous iteration. A partnership with Axelera AI, announced March 18, will bring compressed models onto edge computing platforms where latency, privacy, and energy consumption are critical.

The business case writes itself. Enterprises running large language models face GPU bills that scale with parameter count. Multiverse says its compression significantly cuts processing time and inference costs. Clients including Iberdrola, Bosch, and the Bank of Canada are already using the technology, and the company — which raised a $215 million Series B — is in discussions for another funding round.

“By combining Multiverse’s advanced compressed AI models with Axelera’s high-performance edge platforms, we can bring powerful reasoning capabilities to devices where latency, privacy and energy consumption are critical,” CEO Enrique Lizaso said.

An industry that burned through billions of dollars in compute to make models larger is now paying a startup to make them smaller again. The math has a certain poetry to it.

Sources