Pangaya, led by a founder who sold a video startup to Apple, is using deepfakes to influence video dubbing

[ad_1]

There is a huge opportunity for generative AI in the world of translation, and it is called a startup Pangaea It takes the concept to the next level: a highly realistic, AI-based dubbing tool for videos that recreates the original audio of a person speaking the new language, automatically adjusting the video and physical movements of the speaker to match naturally with the new audio. Speech patterns.

After lying low for the past three years, the startup has unveiled BodyTalk, the first version of its product, along with its first external funding of $9.5 million.

Panjaya is the brainchild of Hilik Shani and Ariel Shalom, two deep learning specialists who spent most of their careers quietly working on deep learning technology for the Israeli government and are now respectively the startup’s general manager and CTO. They hung up their G-man hats in 2021 with the startup itch, and 1.5 years ago, Jay Bickers joined them as CEO.

Piekarz is not a founder at Panjaya, but he is a prominent name: In 2013, he sold a startup he owned an act Found for Apple. Matcha, as the startup was called, was a cheerful early player in video discovery and recommendation, and was acquired during the early days of Apple’s TV and streaming strategy, when these were more rumors than actual products. Matcha was marketed and sold for a song: between $10 million and $15 million — which is modest considering Apple’s eventual significant thrust into streaming media.

Piekarz stayed with Apple for nearly a decade building Apple TV and then its sports business. He was subsequently introduced to Pangaea by Viola Ventures, one of its backers (others include R-Squared Ventures, JFrog co-founder and CEO Shlomi Ben-Haim, Chris Rice, Guy Shourie, and Ryan Floyd of Storm Ventures, and Ali Behnam of Riviera Partners). And Oded Vardi.

“I had left Apple by then and was planning to do something completely different,” Beckers said. “However, I was blown away by seeing the technology demo, and the rest is history.”

BodyTalk is interesting because it simultaneously brings together several pieces of technology that play on different aspects of synthetic media into the framework.

It starts with voice translation which can currently provide translations in 29 languages. The subtitles are then spoken by a voice that mimics the original speaker, which in turn is set to a copy of the original video as the speaker’s lips and other movements are modified to fit the new words and phrasing. All of this is automatically generated on videos after users upload them to the platform, which also comes with a dashboard that includes more editing tools. Future plans include an application programming interface (API), as well as moving closer to real-time processing. (Currently, BodyTalk works “almost in real time,” taking minutes to process videos, Bekars said.)

“We use best of breed where we need to,” Beckers said of the company’s use of third-party large language models and other tools. “And we build our own AI models where the market doesn’t have a real solution.”

One example of this, he continued, is the company’s lip sync. “Our lip sync engine was developed entirely by our AI research team, because we haven’t found anything that reaches this level and quality for multiple speakers, angles and all the commercial use cases we want to support.”

Its focus currently is B2B only; Clients include JFrog and media organization TED. The company plans to expand further into the media field, specifically in areas such as sports, education, marketing, healthcare and medicine.

The resulting subtitled videos are quite bizarre, not unlike what you get with a deepfake, though Piekarz frowns upon the term, which has gained negative connotations over the years and is the exact opposite of the market the startup is targeting.

“Deep fakes are not something we care about,” he said. “We are looking to avoid that name entirely.” Instead, he said, think of Pangaea as part of the “deep true category.”

He added that by targeting only the B2B market, and controlling who can access its tools, the company creates “guardrails” around the technology to protect against misuse. He also believes that in the long term there will be more tools created, including watermarks, to help detect when any videos have been edited to create synthetic media, whether legitimate or nefarious. “We definitely want to be a part of that and not allow misinformation,” he said.

Printing is not fine

There are a number of startups competing with Panjaya in the broader field of AI-based video translation, including big names like Vimeo and Eleven Labs, as well as smaller companies like Speechify and Synthesis. For all of them, building ways to improve the way dubbing works feels like swimming against a strong current. This is because captions have become a pretty standard part of how video is consumed these days.

On TV, this is due to a combination of reasons such as weak speakers, background noise in our busy lives, mumbling actors, limited production budgets, and more sound effects. CBS found in a survey of American television viewers that more than half kept subtitles “some (21%) or all (34%) of the time.”

But some people like captions just because they’re fun to read, and there’s been a whole cult built around that.

On social media and other apps, translations are simply integrated into the experience. For example, TikTok began in November 2023 to turn on captions by default on all videos.

However, there is still a huge market internationally for dubbed content, and even if English is often thought to be the lingua franca of the Internet, there is evidence from research groups such as Canadian Space Agency Content delivered in native languages gets better engagement, especially in a B2B context. Panjaya believes that more natural content in the original language can achieve better results.

Some of her clients seem to support this theory. TED says talks dubbed using Pangaia tools saw a 115% increase in views, with completion rates for translated videos doubling.

[ad_2]

Printing is not fine

Leave a Comment Cancel reply