[ad_1]
Deep L It has made a name for itself by translating online texts that it claims is more precise and accurate than services offered by the likes of Google — an offering that has propelled the German startup to a $2 billion valuation and more than 100,000 paying customers.
Now, as the hype for AI services continues to grow, DeepL is adding another mode to the platform: voice. Users will now be able to use DeepL Voice to listen to someone speaking in one language and automatically translate it into another language in real time.
English, German, Japanese, Korean, Swedish, Dutch, French, Turkish, Polish, Portuguese, Russian, Spanish and Italian are the languages that DeepL can “hear” today. Translated captions are available for all 33 languages that DeepL Translator currently supports.
DeepL Voice currently stops offering the result as an audio or video file itself: the service is aimed at live chats and real-time video conferencing, and comes in the form of text, not audio.
In the first of these options, you can set up your translations to appear as “mirrors” on your smartphone — the idea is that you hold the phone between you on a conference table so each side can see the translated words — or as versions that you share side-by-side with someone. The video conferencing service sees translations appear as subtitles.
This may be something that changes over time, as Jarek Kotilowski, the company’s founder and CEO (pictured above), hinted in an interview. This is DeepL’s first audio product, but it’s unlikely to be its last. “(Audio) is where the translation will take place next year,” he added.
There is other evidence to support this statement. Google — one of DeepL’s biggest competitors — has also begun integrating real-time translated captions into its Meet video conferencing service. There are many AI startups that are building voice translation services, such as Eleven Labs, which specializes in AI voice (Eleven dubbing laboratories), and Panjaya, which creates subtitles using “deepfake” voices and videos that match the audio.
The latter uses the Eleven Labs API, and according to Kutylowski, Eleven Labs itself uses DeepL technology to power its translation service.
Audio output isn’t the only feature yet to be released.
There is also no audio product API at the moment. DeepL’s main business is focused on B2B, and Kotilovsky said the company works with partners and customers directly.
There’s also not a wide choice of integrations: The only video calling service that currently supports DeepL translations is Teams, which “covers most of our customers,” Kotilovsky said. There is no information on when or if Zoom or Google Meet will integrate DeepL Voice in the future.
The product will seem like it’s been a long time coming for DeepL users, and not just because we’ve been overwhelmed by a plethora of other voice AI services aimed at translation. This was the first order from customers since 2017, the year DeepL was launched, Kotilovsky said.
Part of the reason for the wait is that DeepL has been taking a very deliberate approach to building its product. Unlike many other applications in the world of AI applications that build on and modify other companies’ large language models (LLMs), DeepL’s goal is to build its services from the ground up. In July the company Released A new LLM certificate, Enhanced for Translations, says it outperforms GPT-4, and those offered by Google and Microsoft, not least because its primary purpose is translation. The company also continued to improve the quality of its written output and its lexicon.
Likewise, one of DeepL Voice’s unique selling points is that it will work in real-time, which is important since a lot of “AI translation” services on the market actually operate with a delay, making it more difficult or impossible to use in live situations. This is the use case that DeepL addresses.
Kotilovsky hinted that this is another reason why the new audio processing product is focusing on text translations: they can be calculated and produced very quickly, while processing and AI engineering still have a long way to go before they can produce audio and video. video as quickly as possible.
Video conferencing and meetings are likely use cases for DeepL Voice, but Kotilowski noted that another major case the company envisions is in the service industry, where front-line workers in restaurants, for example, could use the service to help communicate with customers more easily. .
This can be helpful, but it also highlights one of the more difficult points of the service. In a world where we’re all suddenly more aware of data protection and concerns about how new services and platforms co-opt private or proprietary information, it remains to be seen how keen people will be to have their voices captured and used in this way.
Kotilovsky insisted that although the votes would go to its servers for translation (no on-device processing), its systems did not retain anything, nor were they used to train LLM holders. Ultimately, DeepL will work with its customers to ensure that they do not violate the GDPR or any other data protection regulations.
[ad_2]