DeepL, a translation company best known for its text tools, today released a voice-to-voice translation suite that covers use cases such as meetings, mobile and web conversations, and group conversations for frontline workers through custom apps. The company is also releasing an API that lets third-party developers and companies build on top of DeepL’s technology for custom use cases, such as call centers.
“After spending so many years in text translation, voice was a natural step for us,” DeepL CEO Jarek Kutylowski told TechCrunch in an interview. “We’ve come a long way when it comes to text translation and document translation. But we thought there wasn’t a great product for real-time voice translation.”
Kutylowski said the challenges in creating a real-time translation product are about finding a balance between reducing latency — the delay between someone speaking and the translated audio being played — and maintaining accurate results.
DeepL publishes add-ons for platforms such as Zoom and Microsoft Teams, where listeners can either hear real-time translation while others speak in their native language or follow translated text in real-time on the screen. This program is currently in early access, and the company is inviting organizations to sign up for a waiting list. The company also has a product for mobile and web-based conversations that can take place in person or remotely.
DeepL also allows users to join a group conversation in settings such as training sessions or workshops, allowing participants to join via a QR code.
DeepL said its voice-to-voice technology can also learn and adapt to customized vocabulary, such as industry-specific terms and company and person names.
Kutylowski said AI is reshaping what customer service will look like in the coming years. He noted that a translation layer helps companies provide support in languages where qualified staff are scarce and expensive to hire.
Techcrunch event
San Francisco, CA
|
13.-15. October 2026
The company said it controls the entire voice-to-voice stack. But the current system converts speech to text, applies translation, and then converts it back to speech. DeepL believes that since it has been working with text translation for years, it has an advantage in terms of translation quality. Going forward, the company wants to develop an end-to-end voice translation model that skips the text step altogether.
DeepL faces competition from several well-funded startups working in adjacent corners of the space. Sanas, which last year raised $65 million from Quadrille Capital and Teleperformance, uses AI to change a speaker’s accent in real time — a tool primarily aimed at call center agents.
Dubai-based Camb.AI focuses on speech synthesis and translation for media and entertainment company Amazon Web Services, helping them dub and localize video content at scale.
Palabra, backed by Reddit co-founder Alexis Ohanian’s company Seven Seven Six, is building a real-time speech translation engine designed to preserve both meaning and the speaker’s original voice, putting it in more direct competition with what DeepL is now building.
