catherine-breslin-2dA2zWv0A8o-unsplash

Catherine Breslin

By Stephen Beech

AI-generated "deepfake" voices are now indistinguishable from real human voices, warns new research.

The study shows that the average listener can no longer distinguish between computer-simulated voices and those of real human beings.

Many people still think of AI-generated speech as sounding “fake” or unconvincing and easily told apart from human voices, say scientists.

But the Queen Mary University of London (QMUL) study shows that AI voice technology has now reached a stage where it can create “voice clones” or deepfakes which sound just as realistic as human recordings.

The study, published in the journal PLOS One, compared real human voices with two different types of synthetic voices, generated using state-of-the-art AI voice synthesis tools.

Some were “cloned” from voice recordings of real humans, intended to mimic them, while others were generated from a large voice model and did not have a specific human counterpart.

Study participants were asked to evaluate which voices sounded most realistic and which sounded most dominant or trustworthy.

pexels-solenfeyissa-20870796

(Photo by Solen Feyissa via Pexels)

The research team also looked at whether AI-generated voices had become “hyperreal," given that some studies have shown that AI-generated images of faces are now judged to be human more often than images of real human faces.

While the study did not find a “hyperrealism effect” from the AI voices, it did show that voice clones can sound as real as human voices, making it difficult for listeners to distinguish between them.

Both types of AI-generated voices were evaluated as more dominant than human voices, and some were also perceived as more trustworthy.

Study co-leader Dr. Nadine Lavan, senior lecturer in psychology at QMUL, said: “AI-generated voices are all around us now.

"We’ve all spoken to Alexa or Siri, or had our calls taken by automated customer service systems.

“Those things don’t quite sound like real human voices, but it was only a matter of time until AI technology began to produce naturalistic, human-sounding speech.

"Our study shows that this time has come, and we urgently need to understand how people perceive these realistic voices.”

pawel-czerwinski-eybM9n4yrpE-unsplash

Pawel Czerwinski

Dr. Lavan pointed out how easily and quickly the team had been able to create clones, or deepfakes, of real voices - with the consent of their owners - using commercially available software.

She said: “The process required minimal expertise, only a few minutes of voice recordings, and almost no money.

“It just shows how accessible and sophisticated AI voice technology has become.”

Dr. Lavan says the pace of improvement has been "very rapid" and carries implications for ethics, copyright, and security - especially in areas such as fake news, fraud, and impersonation.

But she added, "The ability to generate realistic voices at scale opens up exciting opportunities.

“There might be applications for improved accessibility, education, and communication, where bespoke high-quality synthetic voices can enhance user experience.”

Originally published on talker.news, part of the BLOX Digital Content Exchange.

(0) comments

Welcome to the discussion.

Keep it Clean. Please avoid obscene, vulgar, lewd, racist or sexually-oriented language.
PLEASE TURN OFF YOUR CAPS LOCK.
Don't Threaten. Threats of harming another person will not be tolerated.
Be Truthful. Don't knowingly lie about anyone or anything.
Be Nice. No racism, sexism or any sort of -ism that is degrading to another person.
Be Proactive. Use the 'Report' link on each comment to let us know of abusive posts.
Share with Us. We'd love to hear eyewitness accounts, the history behind an article.