Luminal raises $5.3 million to build a better GPU code framework

Three years ago, Luminal co-founder Joe Fioti was working on chip design at Intel when he came to a realization. While he worked to make the best chips he could, the more important bottleneck was in the software.

“You can make the best hardware on earth, but if it’s hard for developers to use, they just won’t use it,” he told me.

Now he has started a company that focuses exclusively on that problem. On Monday, Luminal announced $5.3 million in seed funding, in a round led by Felicis Ventures with angel investment from Paul Graham, Guillermo Rauch and Ben Porterfield.

Fioti’s co-founders, Jake Stevens and Matthew Gunton, come from Apple and Amazon, respectively, and the company was part of Y Combinator’s Summer 2025 batch.

Luminal’s core business is simple: The company sells computers, just like neo-cloud companies like Coreweave or Lambda Labs. But where those companies focus on GPUs, Luminal has focused on optimization techniques that let the company squeeze more computing out of the infrastructure it has. In particular, the company is focusing on optimizing the compiler that sits between written code and the GPU hardware — the same developer systems that caused Fioti so many headaches in his previous job.

Currently, the industry’s leading compiler is Nvidia’s CUDA system – an underrated element of the company’s runaway success. But many elements of CUDA are open source, and Luminal is betting that with many in the industry still scrambling for GPUs, there will be a lot of value to be gained by building out the rest of the stack.

It’s part of a growing group of inference optimization startups that have become more valuable as companies look for faster and cheaper ways to run their models. Inference providers like Baseten and Together AI have long specialized in optimization, and smaller companies like Tensormesh and Clarifai are now emerging to focus on more specific technical tricks.

Luminal and other members of the cohort will face stiff competition from optimization teams at large laboratories, which have the advantage of optimizing for a single family of models. Luminal works for clients and has to adapt to the model that comes their way. But even with the risk of being kicked out of the hyperscales, Fioti says the market is growing fast enough that he’s not worried.

“It’s always going to be possible to spend six months fine-tuning a model architecture on a given piece of hardware, and you’re likely to beat any kind of, any kind of compiler performance,” Fioti says. “But our big bet is that anything less than that, the universal use case is still very economically valuable.”