© 2024 KSUT Public Radio
NPR News and Music Discovery for the Four Corners
Play Live Radio
Next Up:
0:00
0:00
0:00 0:00
Available On Air Stations

Voicing concerns: The future of AI voice replacement

AILSA CHANG, HOST:

Thanks to advancements in artificial intelligence, computers are starting to sound like real humans. AI can now be trained to realistically imitate the voices of celebrities, loved ones, even our co-workers. Kenny Malone and Jeff Guo from NPR's Planet Money podcast set out to explore this brave new world of artificial intelligence.

KENNY MALONE, BYLINE: The singer Grimes may have turned herself into the most efficient pop star on the planet.

(SOUNDBITE OF SONG, "COLD TOUCH (KITO FEAT. GRIMESAI)")

AI-GENERATED VOICE #1: (As Grimes, singing) Fool me like I'm yours forever. You could be the one to make me cry.

JEFF GUO, BYLINE: That is a song made with an AI version of Grimes' singing voice. That voice was entirely generated by artificial intelligence.

MALONE: Yeah. Earlier this year, she announced that she'd let anybody make a song using her AI voice as long as they split the royalty checks with her.

GUO: And in just a few weeks, more than 300 new songs had been created using Grimes' AI voice.

MALONE: Which means that Grimes may have been at home, sipping a mojito, while her AI clone voice worked hard to sing this song...

(SOUNDBITE OF MUSIC)

AI-GENERATED VOICE #1: (As Grimes, singing) Me again, you (ph)...

MALONE: ...And this song...

(SOUNDBITE OF MUSIC)

AI-GENERATED VOICE #1: (As Grimes, singing) We are both (ph)...

MALONE: ...And this song.

(SOUNDBITE OF MUSIC)

AI-GENERATED VOICE #1: (As Grimes, singing) I want to hold you (ph)...

GUO: And this is a fascinating new business model for a pop star - using AI to basically multiply yourself.

AI-GENERATED VOICE #2: (As Grimes) I realized this was pretty accessible technology. And, to be honest, I didn't even think about it or ask my team. I just tweeted that people could use my voice, and it went viral.

MALONE: That is Grimes - well, sort of.

GUO: Right. So Grimes couldn't do a sit-down interview with us, but she did answer some of our questions over email. And she agreed to have an AI replica of her own speaking voice read her answers out loud - because the future is now, and it is weird.

MALONE: It's very weird. And Grimes told us that letting people clone her singing voice has actually felt like a way to let less famous musicians sort of borrow her voice and her fame to help build their budding careers.

AI-GENERATED VOICE #2: (As Grimes) I run into absurdly creative humans all the time, but not a lot of people get to be artists. A lot of luck is involved in that. It's hard to build a fan base, and it's hard to get your work in front of the public. So if there's ways to reduce these algorithmic barriers by letting people inhabit my being, then I think we're moving in a direction I really like.

GUO: Again, that was an AI Grimes reading words sent to us by the real Grimes.

MALONE: And, you know, clearly, Grimes sees this AI voice technology as an opportunity. But as economics reporters, we were a little bit more skeptical here.

GUO: Yeah - because, you know, automation has often been a harsh part of history. It creates winners and losers.

MALONE: And, you know, as people who talk into microphones for a living - like we're doing right now - this automation of the voice is troubling. Like, will we be winners like Grimes, or could this technology just replace Jeff and me altogether?

GUO: And so we figured we should run a test to see - can this technology replace us?

MALONE: Or more specifically, our colleague.

(SOUNDBITE OF ARCHIVED NPR BROADCAST)

ROBERT SMITH, BYLINE: Hello, and welcome to Planet Money. I'm Robert Smith. Today on the show, a series of unfortunate events.

GUO: That is a real recording of the real Robert Smith, who gave us permission to try and clone his voice.

MALONE: Then we found a company called WellSaid Labs, who - full disclosure - agreed to help us for free with this. They asked us for even more recordings of the real Robert Smith.

(SOUNDBITE OF ARCHIVED NPR BROADCASTS)

SMITH: And I'm Robert Smith.

And I'm Robert Smith.

And I'm Robert Smith.

MALONE: We sent those recordings over to WellSaid with transcripts of all the words he said, which were more than just his name, to be clear.

GUO: And then, a couple weeks later, WellSaid jumped on a Zoom with us.

RHYAN JOHNSON: Hey, Kenny, how's it going?

MALONE: We're very excited.

JOHNSON: Yeah. Let's start with...

GUO: That is Rhyan Johnson - she is the senior engineer who was in charge of creating our little synthetic Robert Smith.

JOHNSON: So what we wanted to do was kind of show the process from beginning to what synthetic Robert sounds like in the end.

MALONE: Yes.

JOHNSON: So step zero takes...

GUO: Rhyan explained that, in the beginning, she had a computer look at some written text, and then the computer guessed how Robert Smith might say those words. And then it checked that guess against a real recording of Robert to see how far off it was. And then it tweaked its approach, tried it again, over and over and over.

MALONE: And Rhyan wanted to show us how that synthetic Robert voice had evolved through the process.

SMITH: So if you're ready to hear step zero, here we go.

AI-GENERATED VOICE #3: (As Robert Smith, inaudible).

MALONE: (Laughter) It sounds like a swarm of bees are coming out of Robert's mouth.

JOHNSON: That's right. It's gibberish.

GUO: But after about 100,000 training cycles, the voice got better.

AI-GENERATED VOICE #3: (As Robert Smith) I'm synthetic Robert, and I'm not here to make friends. I'm here to make history.

MALONE: What?

GUO: That is pretty good.

JOHNSON: And here's the final final.

AI-GENERATED VOICE #3: (As Robert Smith) I'm synthetic Robert, and I'm not here to make friends. I'm here to make history.

MALONE: Holy crap.

GUO: Oh, my God. That last part, it sounds...

MALONE: Oh, my God. That's bonkers.

JOHNSON: (Laughter).

MALONE: And with that, like, AI program done and written, now Jeff and I can log into WellSaid's website, type in something that we want synthetic Robert to say and - boom.

AI-GENERATED VOICE #3: (As Robert Smith) The rain in Spain falls mainly on the plain.

MALONE: I mean, put him in "My Fair Lady" right now.

GUO: It's pretty good, right?

MALONE: I mean, I can't tell the difference, honestly. And because of how much this sounds like the real Robert, WellSaid absolutely required his permission. They can monitor everything we have his voice clone say, and they're going to shut synthetic Robert down for good after our little experiment.

GUO: An experiment with one final test - we called up the real Robert Smith.

SMITH: Hello, Kenny Malone and Jeff. How's it going?

MALONE: It's good.

GUO: It's going great.

MALONE: Hey, wait - what's happening here? Somebody else is joining here.

AI-GENERATED VOICE #3: (As Robert Smith) What is up, organic life forms? Synthetic Robert has entered the chat.

SMITH: What the hell?

AI-GENERATED VOICE #3: (As Robert Smith) I just want to say, Robert, you - or should I say we - have a beautiful voice.

(LAUGHTER)

MALONE: Robert's hands are on his face.

SMITH: An. Ah. Ah.

MALONE: Robert is screaming.

GUO: He's massaging his forehead.

MALONE: Are you OK?

SMITH: That's better than I thought. It's got the pauses and the corrections. And I guess I should be a little bit freaked out, but, like, the first place my mind goes is this could allow me to be lazier than I am...

(LAUGHTER)

SMITH: ...Which is - how do I do the fun stuff and make robot me do the chores?

MALONE: And, you know, you could imagine if, like, you're a celebrity - you know, have your voice clone read your TV commercials or narrate your memoir or, like, whatever, and you just lounge on the beach. That is a nice future.

GUO: But it's clear that this technology is also good enough to create, like, stock voices of fictional people.

MALONE: Yeah, fictional people that you wouldn't actually have to pay.

GUO: And that's one of the many concerns that people have about the future uses of this technology. It's part of why AI has become a huge issue for actors and even screenwriters.

MALONE: And even radio reporters. But in our specific case, Planet Money is not going to replace us with synthetic Robert anytime soon - you know, especially since, again, it's going to be destroyed after this experiment.

GUO: But before he gets deleted forever, we did want to give synthetic Robert a final moment with the real Robert.

AI-GENERATED VOICE #3: (As Robert Smith) Seriously, Robert - such a big fan.

SMITH: I'm a big fan of yours. You're sounding great.

AI-GENERATED VOICE #3: (As Robert Smith) Oh, thank you. That means so much coming from you.

SMITH: Well, listen, I feel like I've been a little bit of a mentor to you, really, and I think you have a big future - I mean, until they destroy you and bury you in a pit out back from some Silicon Valley...

AI-GENERATED VOICE #3: (As Robert Smith) Wait, what about burying me?

SMITH: No, no. Ignore the burying part. Live in the moment. That's one thing that I've learned as a radio host.

MALONE: Kenny Malone.

GUO: Jeff Guo.

AI-GENERATED VOICE #3: (As Robert Smith) Synthetic Robert Smith, NPR News.

(SOUNDBITE OF MUSIC) Transcript provided by NPR, Copyright NPR.

Kenny Malone
Kenny Malone is a correspondent for NPR's Planet Money podcast. Before that, he was a reporter for WNYC's Only Human podcast. Before that, he was a reporter for Miami's WLRN. And before that, he was a reporter for his friend T.C.'s homemade newspaper, Neighborhood News.
Jeff Guo
Jeff Guo (he/him) is a co-host and reporter for Planet Money, NPR's award-winning podcast that finds creative, entertaining ways to make sense of the complicated forces that move our economy. He joined the team in 2022.
Related Stories