Multiverse Computing is pushing its compressed AI models into the mainstream

With private company defaults up to 9.2% – the highest rate in years – VC firm Lux Capital recently advised companies relying on AI to get their computing capacity commitments confirmed in writing. With financial instability rippling through the AI supply chain, Lux warned, a handshake agreement is not enough.

But there is a completely different option, which is to stop relying on external computing infrastructure altogether. Smaller AI models that run directly on a user’s own device — no data center, no cloud provider, no counterparty risk — are becoming good enough to be worth considering. And Multiverse Computing raises its hand.

The Spanish startup has so far kept a lower profile than some of its peers, but as the demand for AI efficiency grows, this is changing. Having compressed models from major AI labs, including OpenAI, Meta, DeepSeek, and Mistral AI, it has launched both an app that showcases the capabilities of its compressed models and an API portal — a gateway that lets developers access and build with those models — that makes them more accessible.

The CompactifAI app, which shares its name with Multiverse’s quantum-inspired compression technology, is an AI chat tool along the lines of ChatGPT or Mistral’s Le Chat. Ask a question and the model answers. The difference is that Multiverse embedded Gilda, a model so small it can run locally and offline, according to the company.

For end users, this is a taste of AI at the edge, with data that doesn’t leave their devices and doesn’t require a connection. But there is a caveat: their mobile devices must have enough RAM and storage space. If they don’t — and many older iPhones won’t — the app switches back to cloud-based models via API. The route between local and cloud processing is handled automatically by a system that Multiverse has named Ash Nazg, whose name will ring a bell for Tolkien fans as it refers to the One Ring inscription in “The Lord of the Rings.” But when the app routes to the cloud, it loses its key privacy advantage in the process.

These limitations mean CompactifAI isn’t quite ready for mass customer adoption yet, though that may never have been the goal. According to data from Sensor Tower, the app had fewer than 5,000 downloads in the past month.

The real target is companies. Today, Multiverse is launching a self-service API portal that gives developers and enterprises direct access to its compressed models—no AWS Marketplace required.

Techcrunch event

San Francisco, CA
|
13.-15. October 2026

“The CompactifAI API Portal 1773908926 gives developers direct access to compressed models with the transparency and control needed to run them in production,” CEO Enrique Lizaso said in a statement.

Real-time usage monitoring is one of the key features of the API, and it’s no accident. In addition to the potential benefits of implementing at the edge, lower computational costs are one of the main reasons why companies consider smaller models as an alternative to large language models (LLMs).

It also helps that small models are less limited than they used to be. Earlier this week, Mistral updated its small model family with the launch of the Mistral Small 4, which it says is simultaneously optimized for general chat, coding, agent tasks and reasoning. The French company also released Forge, a system that lets companies build custom models, including small models, that they can choose the trade-offs their use cases can best tolerate.

Multiverse’s latest findings also suggest that the gap with LLMs is closing. Its latest compressed model, the HyperNova 60B 2602, is built on gpt-oss-120b – an OpenAI model whose underlying code is publicly available. The company claims it now delivers faster responses at lower cost than the original it was derived from, an advantage that is especially important for agentic coding workflows where AI autonomously completes complex, multi-step programming tasks.

Making the models small enough to work on mobile devices while still remaining useful is a big challenge. Apple Intelligence circumvented this problem by combining an on-device model and a cloud model. Multiverse’s CompactifAI app can also route requests to gpt-oss-120b via API, but its main goal is to show that local models like Gilda and its future replacements have benefits that go beyond cost savings.

For workers in critical areas, a model that can run locally and without connecting to the cloud offers more privacy and resiliency. But the greater value is in the business uses this can unlock – for example, embedding artificial intelligence in drones, satellites and other settings where connectivity cannot be taken for granted.

The company already serves more than 100 global clients, including Bank of Canada, Bosch and Iberdrola, but expanding its client base could help it unlock more funding. After raising a $215 million Series B last year, it is now rumored to be raising a fresh €500 million funding round worth more than €1.5 billion.