Brett Jordan
By Stephen Beech
AI voice clones are easier to understand than humans, according to a new study.
Voice clones can recreate a person's speech using just a few seconds of recorded speech.
Researchers found the AI copy is more intelligible in noisy environments, but they are unsure why.
Synthetic voices are increasingly part of our lives, from digital assistants such as Siri and Alexa to automated telemarketers and answering machines.
Researchers from University College London (UCL) and the University of Roehampton evaluated the intelligibility of humans and voice clones.
Their findings, published in The Journal of the Acoustical Society of America (JASA), showed that voice clones were up to 20% more intelligible than humans in noisy environments.
The research team explained that voice clones differ from traditional synthetic voices in the amount of sampling they require.
Voice clones, which can recreate a human's speech using only a few seconds of recorded speech, are more intelligible in noisy environments, research finds. (AIP via SWNS)
Synthetic voices such as Siri require a voice actor to spend hours in a recording booth.
But a voice clone can be made from as little as 10 seconds of speech, expanding the number of potential voices as well as the number of potential applications.
Researchers Patti Adank, of UCL, and Han Wang, of the University of Roehampton, specialize in studying human perception of unclear speech and were fascinated by the idea of machine-replicated speech.
A key question they were looking to answer was just how easy voice clones are for the average person to understand.
They suspected that voice clones would simply be poor representations of actual human voices — and that people would struggle to understand them.
But Adank says what they found could not be more different.
She said: "I thought initially that voice clones would be less intelligible because they were unfamiliar.
"I found they were up to 20% more intelligible, which was quite shocking.
"A small part of our paper is talking about that experiment, and then a large part is me and my collaborator frantically trying to find out what it is that makes those voice clones more intelligible."'
Pawel Czerwinski
The duo initially presented volunteer participants with human voices and voice clones, asking them to rate their intelligibility.
After finding that voice clones were consistently rated easier to understand, they repeated the experiment with elderly volunteers to determine if being hard-of-hearing alters the effect.
With American volunteers — the original participants were British — to judge if the accent plays a role; and with a filter designed to mimic cochlear implants.
In every case, voice clones emerged victorious.
After examining over 100 acoustic measurements, Adank believes the only way to solve the mystery is to work with collaborators who specialize in text-to-speech systems to adapt an existing open-source cloning system.
She added: "I am now going to try and recreate [the effect] by studying how synthesizers work and how they use digital signal processing to generate those voices, just to get a bit of a handle on this."


(0) comments
Welcome to the discussion.
Log In
Keep it Clean. Please avoid obscene, vulgar, lewd, racist or sexually-oriented language.
PLEASE TURN OFF YOUR CAPS LOCK.
Don't Threaten. Threats of harming another person will not be tolerated.
Be Truthful. Don't knowingly lie about anyone or anything.
Be Nice. No racism, sexism or any sort of -ism that is degrading to another person.
Be Proactive. Use the 'Report' link on each comment to let us know of abusive posts.
Share with Us. We'd love to hear eyewitness accounts, the history behind an article.