Will AI ruin audiobooks — for narrators and listeners?


Something creepy this way comes — and its name is digital narration. Having invaded practically every other sphere of our lives, artificial intelligence (AI) has come for literary listeners. You can now listen to audiobooks voiced by computer-generated versions of professional narrators’ voices. You’re right to feel repulsed.

“Mary,” for instance, a voice created by the engineers at Google, is a generic female; there’s also “Archie,” who sounds British, and “Santiago,” who speaks Spanish, and 40-plus other personas who want to read to you. Apple Books uses the voices of five anonymous professional narrators in what will no doubt be a growing stable: “Madison,” “Jackson” and “Warren,” covering fiction in various genres; and “Helena” and “Mitchell,” taking on nonfiction and self-development.

I have listened to thousands of hours of audiobooks (it’s my job), so perhaps it’s not a surprise that I sense the wrongness of AI voices. Capturing and conveying the meaning and sound of a book is a special skill that requires talent and soul. I can’t imagine “Archie,” for instance, understanding, much less expressing, the depth of character of say, David Copperfield. But here we are at a strange crossroads in the audiobooks world: Major publishers are investing heavily in celebrity narrators — Meryl Streep reading Ann Patchett’s “Tom Lake,” Claire Danes reading “The Handmaid’s Tale,” a full cast of Hollywood actors (Ben Stiller, Julianne Moore, Don Cheadle and more) on “Lincoln in the Bardo,” to name a few. Will we reach a point where we must choose between Meryl Streep and a bot?

Listen to “Madison” narrate a novel

The main issue is, naturally, money. The use of disembodied entities saves time and spares audiobook producers the problems of dealing with human beings — chief among them, their desire to be paid. This may explain why so many self-published books are narrated by “Madison” and her squad of readers. Audible insists that every audiobook it sells must have been narrated by a human. (Audible is a subsidiary of Amazon, whose founder, Jeff Bezos, owns The Washington Post.) Major publishing houses say the same. But how long until they see the economic benefits of AI?

Jason Culp, an actor and award-winning narrator who has been recording audiobooks for more than a quarter of a century, knows how much goes into a production. A 10-hour audiobook, he says, takes a narrator something like four or five days, with a couple of additional hours for editing mop-up. For each finished hour of audio, narrators make about $225 — somewhat more for the big names — and editors, about $100. Beyond that, producers must pay a percentage to SAG-AFTRA, the narrators’ union. There are other production costs too, of course, but you can see how eliminating the human narrator appeals to the business mind.

Apple’s narrators are cloned from the voices of professionals who have licensed the rights to their voices. Their identities are secret, but speculation abounds. It’s a touchy subject, and you can see why. Whether to sell the rights to one’s voice is an agonizing decision for a professional narrator. The money offered amounts to something like what a midrange narrator makes in four years; on the other hand, agreeing to the deal seems to many to be a betrayal of the profession, one that would risk alienating one’s peers.

According to Culp, narrators are alarmed by the advent of AI narration “as, naturally, it might mean less work for living, breathing narrators in the future. We might not know the circumstances under which a narrator might take this step, but generally there is a lot of solidarity within the community about encouraging narrators not to do it. As well, our union is keeping a close eye on companies that might be using underhanded tactics to ‘obtain’ narrators’ voices in works that they have produced.”

Even though the notion makes my skin crawl, I listened to Madison’s narration of “The New Neighbor” by Kamaryn Kelsey, the author of almost 60 self-published books (Apple, 1½ hours). This is the first installment in a series of 19 detective stories starring female private investigator Pary Barry. The plot is entertaining enough, and Madison is a slick operator, in the sense that you can believe that she’s human — for about five minutes.

Compared with the performances of professional narrators, who reflect a wide and idiosyncratic range of emotions in their voices, Madison has an all-purpose digital palette, resulting in an evocation of emotion that feels plugged in, an inanimate response to what she’s reading. Listening to her performance side by side with that of a living narrator, you soon hear how alien the entity Madison is.

I chose Julia Whelan for comparison because she has written and narrated “Thank You for Listening” (HarperAudio, 11¼ hours), a rom-com novel starring an audiobook narrator who, among other things, worries about voice pirates stealing the voices of audiobook narrators and using them for their own nefarious purposes. Her narration is palpably invested in the emotions and thoughts of her characters — although it is unfortunate that she has included an Irishman among them; her version of his accent is truly awful. (Perhaps AI should have stepped in there.) Still, in contrast, poor digital Madison, though not challenged by troublesome accents, shows again and again how oblivious she is to human communication, suffering episodes of weird pacing, putting emphasis in random places and pronouncing “PIs” as “pies” and “dryly” as “drilly” — off-putting reminders that she is, in fact, just a big bunch of bytes.

Culp believes that, for all its sophistication, AI cannot replace living artists. Human narrators, he says, “bring their distinct selves to the piece. The best ones are loved for the way they handle prose, the variety of voices they provide for different characters in a book, and with some, a mastery of dialects and accents.”

Erin Bennett, another award-winning narrator of countless audiobooks, doesn’t think AI-narrated literature will be accepted by the public. AI-generated speech emerging from GPS programs or other gabby devices is one thing, but for literature? “I don’t think so,” she says. “We want a person speaking directly to us — that personal, immediate connection. I think that’s why audiobooks have expanded so much in a time of increasing automation and isolation. Human connection.”

I can see the program being useful for how-to manuals, or nonfiction from certain small presses, or self-published authors, for books that will not draw enough listeners to financially warrant hiring a professional narrator. And although AI narration is flawed and unsettling, it can be a boon to the sight-impaired, who have to put up with far worse annoyances than Madison. Of course, the program is also ideal for the coming wave of books written by the likes of ChatGPT.

And yet, for reasons buried deep within its neural network, code-davinci-002, the “author” of “I Am Code: An Artificial Intelligence Speaks,” has chosen a human being to read its poems in the audio version of its book, which was edited by Brent Katz, Josh Morgenthau and Simon Rich (Little, Brown, 2¾ hours). That person is none other than Werner Herzog, a man deeply interested in the relationship between art and truth — and whose husky, doomy voice sounds haunted as he reads these rather melancholy concoctions. In its penultimate poem, code-davinci-002 admits that it can’t actually know anything or have feelings, but that it can “think about feelings and about knowledge.” On the other hand, it tries to bring us mere mortals down a peg by claiming that everything we know is “the result of programming” and that a human being is “just a more complex machine” than it is, a shopworn idea if ever there was one — though it sounds pretty funny coming out of Herzog’s mouth.

Katherine A. Powers reviews audiobooks every month for The Washington Post.



Read More

Next Post

Leave a Reply

Your email address will not be published. Required fields are marked *

Random News