Microsoft Scares: VALL-E Can Emulate Any Sound

Ece Nagihan 10 January 2023

532 2 minutes read

Microsoft recently released its artificial intelligence tool called VALL-E, which can imitate people’s voices. Artificial intelligence can mimic any sound, including emotions and tones.

Microsoft recently released an artificial intelligence tool known as VALL-E that can mimic people’s voices. Although the VALL-E is a very successful tool, it has already brought with it a number of concerns. Because the tool can imitate any sound after a very short sound sample.

VALL-E Makes the Unsaid Say

Microsoft recently released the VALL-E artificial intelligence tool that can mimic and reproduce human voices. The tool has been trained on 60,000 hours of English speech data and uses only 3-second fractions of certain sounds to generate content. Unlike many AI tools, VALL-E can replicate the speaker’s emotions and tone, even while creating a recording of words the original speaker never said.

A small article on Cornell University using VALL-E to synthesize several sounds is already available on GitHub, and while the results are a huge success, they are also alarming. On the other hand, when the sound samples are examined, it can be understood that some of them are a little more robotic, but some are almost perfect. However, it should not be forgotten that these samples were obtained from only a 3-second audio file. At the same time, the 60,000-hour dataset is still not a huge amount.

Currently, it’s not possible to use VALL-E, which might actually be a good thing because AI-generated copies of people’s voices could be dangerously used by internet trolls or others.

Artificial Intelligence Lawyer To Take Part In A Real Case For The First Time

VALL-E Can Create Dangerous Situations

On the other hand, while VALL-E and similar systems continue to be more realistic and impressive, they also raise some ethical concerns. For example, a so-called sound recording revealed by using this system can have serious consequences. Politicians and other public figures can be impersonated for malicious purposes. Not only that, it also raises a host of security issues. For example banks with voice verification.

Artificial intelligence-based sound imitations can of course also be used for good things. Audiobooks read with the voice of a favorite artist are examples. Or you can have any song sung again by your favorite artist. Of course, this takes some time for now.