Customer support and service are among the hottest sectors in voice AI right now. But building a product that sounds human and responds without noticeable lag is proving much more difficult in some markets than others – and most of the big players weren’t built with Africa and the Middle East in mind.
AethexAI, a startup founded last year to close this gap, has raised $3 million in pre-seed funding led by 4DX Ventures with participation from Enza Capital, Dorm Room Fund, Mojo Ventures and Stanford GSB 26 Fund. Individual investors include Stanford faculty, telecommunications executives and AI researchers from Anthropic.
Instead of using existing orchestration tools like Vapi and LiveKit, the company built its own small model and orchestration layer from scratch to handle the localized dialects of English, French, and Arabic spoken across its target markets—a decision driven, as we’ll get to, by the particular requirements of operating in the region.
The company is also launching its platform where companies can test its technology and sign up for its services, along with APIs and SDKs for developers to experiment with its models.
The startup was founded by Mariama Diallo and Ayooluwa Odemuyiwa. CEO Diallo worked at Goldman Sachs and later joined YC-backed ModelML as a product and growth hire. CTO Odemuyiwa graduated from Caltech, worked at Meta and enrolled in Stanford Business School before co-founding the company. The couple wanted to build something for new markets and started looking for opportunities.
Companies around the world are racing to use AI tools to automate parts of their operations. But it doesn’t always work out. In Egypt, a call center automated a significant portion of its calls but rolled back the system due to poor results, the founders found. Several support centers in Africa told them that finding and hiring engineers to automate calls at the right cost was an ongoing headache.
“The latency and jitter that we saw on automated calls in this region was outrageous. If we had become orchestrators, we might have had to use large models hosted outside the region, resulting in higher latency. We realized that for this to work, we need to use very small models and cut latencies at every step,” Odemuyiwa told TechCrunch of the company’s decision to build the orchestra.
AI labs deploying their latest models usually spend millions training them and acquiring data. AethexAI found a solution for both. Instead of chasing the largest possible models, it decided that small models are enough to tackle the latency problem while maintaining accuracy, and developed its own Kora series with parameters from 300 million to 1.7 billion. It’s a fraction of the size of the LLMs, which is precisely the point.
To train these models, the startup used anonymized recordings from a call center partner. It also sent hard drives to radio stations across Africa to collect more audio data. To keep costs down, it built a contributor network of university students to annotate data and pronounce local names. As a result, the startup says, it now handles more than 17,000 calls a day.
On the business side, the company guides customers new to voting AI through the process, offering on-site demos and workshops to help them identify the best use cases for automation.
“We always tell customers that we can’t be everything to everyone right now. We’re small. When we start talking to a company, we ask them to pick a use case that’s the most important for them to start [with]Diallo said.
The startup is open to working across all industries, but currently a large portion of its use cases involve debt collection calls, customer activation or KYC — Know Your Customer verification, the standard identity verification process used by banks and telecoms. The company is hiring forward-thinking engineers on a contract basis to serve local markets and build channel partnerships with telecom providers to handle telephony for voice AI calls. Plug-and-play solutions, it says, simply won’t work here.
Walter Badoo, co-founder and managing partner of 4DX Ventures, argues that the Africa and Middle East market is fundamentally different from the markets most voice AI companies are built to serve.
“Companies in Africa and the Middle East process about three times as many calls as their Western counterparts, as voice remains the dominant channel for customer interaction,” he said. “The existing systems were built for Western markets characterized by high-end GPU infrastructure, standard English and European speech environments, and business workflows common in the US and Europe. This creates real gaps when businesses need systems that handle dialects, code-switching, and casual speech patterns, and that work within their existing telephony infrastructure and their actual price points.”
Put another way, while companies like ElevenLabs, Deepgram, Sierra, and Cognigy are expanding globally at a rapid pace, the markets they are built for and the markets they enter are not always the same thing. Startups like AethexAI are betting that the gaps — models specialized in local dialects, partnerships on the ground, infrastructure built for the region — represent a market opening that the giants have neither the incentive nor the architecture to close.
When you buy through links in our articles, we may earn a small commission. This does not affect our editorial independence.
