AI multi-speaker lip-sync has arrived
Published by Thomas Tillman in Artificial Intelligence News · Thursday 20 Apr 2023
Tags: asianheritagesociety.org, asianheritageawards.com, wordslingerbook.com, asian, american, san, diago, ai, technology
Tags: asianheritagesociety.org, asianheritageawards.com, wordslingerbook.com, asian, american, san, diago, ai, technology
Rask AI, an AI-powered video and audio localisation tool, has announced the launch of its new Multi-Speaker Lip-Sync feature. With AI-powered lip-sync, 750,000 users can translate their content into 130+ languages to sound as fluent as a native speaker.
For a long time, there has been a lack of synchronisation between lip movements and voices in dubbed content. Experts believe this is one of the reasons why dubbing is relatively unpopular in English-speaking countries. In fact, lip movements make localised content more realistic and therefore more appealing to audiences.
There is a study by Yukari Hirata, a professor known for her work in linguistics, which says that watching lip movements (rather than gestures) helps to perceive difficult phonemic contrasts in the second language. Lip reading is also one of the ways we learn to speak in general.
Today, with Rask’s new feature, it’s possible to take localised content to a new level, making dubbed videos more natural.
The AI automatically restructures the lower face based on references. It takes into account how the speaker looks and what they are saying to make the end result more realistic.
How it works:
- Upload a video with one or more people in the frame.
- Translate the video into another language.
- Press the ‘Lip Sync Check’ button and the algorithm will evaluate the video for lip sync compatibility.
- If the video passes the check, press ‘Lip Sync’ and wait for the result.
- Download the video.
According to Maria Chmir, founder and CEO of Rask AI, the new feature will help content creators expand their audience. The AI visually adjusts lip movements to make a character appear to speak the language as fluently as a native speaker.
The technology is based on generative adversarial network (GAN) learning, which consists of a generator and a discriminator. Both the generator and the discriminator compete with each other to stay one step ahead of the other. The generator clearly generates content (lip movements), while the discriminator is responsible for quality control.
The beta release is available to all Rask subscription customers.
(Editor’s note: This article is sponsored by Rask AI)