Researchers race to keep up with improving AI voice clones — and prevent scams

By Ulaa Kuziez

Published June 4, 2024 at 5:18 PM CDT

In April, the FTC announced winners of its Voice Cloning Challenge to monitor and prevent malicious voice cloning. Ning Zhang, assistant professor of computer science & engineering at WashU, is one of three winners. His program, "DeFake,” prevents AI from learning a speaker’s style or tone by embedding noise and data in audio clips. Photographed on May 31, 2024. — Ning Zhang, assistant professor of computer science and engineering at Washington University, poses for a portrait on Friday. He's one of three winners of the Federal Trade Commission's Voice Cloning Challenge. His program, DeFake, disrupts AI attempts at cloning an audio clip by embedding subtle adversarial noise into voice recordings.

As a kid, Ning Zhang wasn’t very good at playing video games. When it came to computers though, he was very savvy. He learned how to modify computer systems so he could win every time.

“As I grew older, I started developing that sense of responsibility. ‘If I know how to do this well, I should use my skill to do good things for humanity,’” said Zhang, an assistant professor of computer science and engineering at Washington University.

Zhang continued along a career path that included work in cybersecurity for years. His recent research around voice cloning and trying to find ways to protect people from nefarious deepfake scams was recognized in April, when the Federal Trade Commission recognized Zhang’s work, along with that of two others, in its Voice Cloning Challenge.

The project, called DeFake, aims to protect audio from being cloned by embedding subtle noise and other distortions into voice recordings. Essentially, Zhang’s program adds watermarks to audio in order to sabotage AI cloning attempts.

Scammers need just a few seconds of a person’s voice — usually taken from clips uploaded to social media — to impersonate them. Scammers typically target people over 60, begging them for help over a false emergency in order to get victims to turn over financial information.

In 2023, consumers lost nearly $2.7 billion to impersonation scams in part due to artificial intelligence, according to the FTC.

That’s why Zhang is working to make the DeFake program accessible for everyday-person use.

“If you get an unknown number from St. Louis [and] pick it up and say ‘Hello, who's there?’ this is enough to copy your voice,” Zhang said. “So we can embed this protection in the phone so when you talk to unknown numbers, it will add this perturbation or adversarial noise so scammers cannot [clone] your voice.”

Elaine Cha's deepfake voice

Elaine Cha's real voice

A lot of questions remain about how artificial intelligence will affect the world, Zhang said. In an election, experts are especially worried about how deepfake content might be used in political campaigns to target or sway voters.

Zhang understands people’s apprehension of machine learning technology. “I share that fear,” he said. Still, he encourages people to lean into AI’s benefits, which include spell-checking essays and emails and predicting weather forecasts and surgery outcomes.

“It's more important to think about how we can ride the wave of AI, how we can leverage AI to improve our productivity and improve our life quality. So I will think about myself, as you know, getting a partner who can work 24/7 and will not complain to me,” Zhang said.

For more on how to detect AI scams, visit the FTC’s guide here.

To learn more about Ning Zhang’s DeFake technology and how artificial intelligence works, listen to the full St. Louis on the Air conversation on Apple Podcasts, Spotify and YouTube, or click the play button below.

As AI voice cloning scams improve, researchers race for solutions

“St. Louis on the Air” brings you the stories of St. Louis and the people who live, work and create in our region. The show is produced by Ulaa Kuziez, Miya Norfleet, Emily Woodbury, Danny Wicentowski, Elaine Cha and Alex Heuer. Roshae Hemmings is our production assistant. The audio engineer is Aaron Doerr.