
This content contains affiliate links. When you buy through these links, we may earn an affiliate commission.
If youβre worried about AI and how quickly itβs being integrated into the publishing industry, this news is not going to make things any better.
AI has been widely used in every aspect of the industry, from marketing to business development, publicity, and even writing, as evidenced by Publisherβs Weeklyβs AI webinar last September. And now, AI is being used in audiobook production as well.
Project Gutenberg, the nonprofit organization responsible for digitizing public domain ebooks and making them free and accessible, collaborated with Microsoft and MIT in September to publish 5,000 AI-produced audiobooks. They were able to do this by using AI-powered neural text-to-speech technology, and the production was heavily automated.
The typical process for producing an audiobook is laborious. As the producer, one must pick the right narrator, have them read the book and conduct research, and have them practice, record, and do retakes. After that, editors will proofread and edit the recordings. Then, sound engineers will mix them to sound good on speakers and to listenersβ ears. This is a lengthy process that takes weeks of work for just one audiobook. Imagine working on 5,000.
With the production of these AI audiobooks, they used previously created ebooks as a starting point. To automate, they developed HTML-based processes to easily parse the text and to allow the AI voice to record and compile the audiobooks into neat packages. They also chose the appropriate voices for each audiobook, depending on genre.
The AI cloned β or recreated β its voice from sample recordings in order to narrate the ebooks. Using advanced AI technology, they were able to add emotions to the words spoken by the AI. βOur system uses new advances in neural text-to-speech, emotion recognition, custom voice cloning, and distributed computing to create engaging and lifelike audiobooks,β they wrote in a paper about the steps they took. This process is roughly similar to actor Edward Herrmannβs case, whose voice was recently cloned to create an audiobook.
The number of AI audiobooks produced by Project Gutenberg et al. is huge when you consider that Penguin Random House Audio, one of the largest audiobook production houses in the entire publishing industry, produces only about 2,400 audiobooks per year.
So how do these AI-produced audiobooks compare to human-narrated ones?
How Do Project Gutenbergβs AI-Produced Audiobooks Sound?
I listened to some of the 5,000 audiobooks, which included nonfiction, fiction, and poetry, such as The Black Tulip by Alexandre Dumas, The Philippine Islands by Ramon Reyes Lala, Stories of King Arthurβs Knights, Told to the Children by Mary MacGregor, The Call of the Wild by Jack London, and Up From Slavery by Booker T. Washington among others.
Although I was able to find titles by authors of color, they pale in comparison to the audiobooks by white authors on the list. Publishing has always been white, with gatekeepers still reckoning with the past. This reflects Project Gutenbergβs list, which includes many classics by white authors that have been turned into audiobooks. Given that it only took them about 30 minutes to produce an AI audiobook, it wonβt hurt for this project to include these 100 classic books by authors of color in the future. This ensures that, as technology advances, marginalized groups arenβt left behind and feel seen in literature. And that can only happen if developers keep diversity in mind.
Meanwhile, while the recordings indeed do sound human-like, the voices are flat and emotionless. Thereβs no variation in voices when it comes to dialogue, as there seem to be no female voices available. In addition, the stories lack the ability to truly touch the readerβs emotions. Thereβs no control over pacing or dramatic narrations, and the same voice is used for all audiobooks, effectively erasing personalization and characterization.
Will AI Replace Human-Recorded Audiobooks?
While the voices do sound human in these AI audiobooks, the art of good narration β accent, pacing, dramatic pronunciation, characterization, and so on β is lacking. Human narrators effectively set the scene, making you fall in love and feel at ease with the story.
Listening to AI audiobooks, on the other hand, doesnβt provide stimulation. When listening to audiobooks, they say that a narrator can make or break an audiobook, and itβs true enough here. Although there are some titles worth checking out from the catalog, they are undermined by the monotonous narration.
In addition to style, almost all of the audiobooks have the same AI narrator. The AI voice reads everything the same way, whether itβs fiction, poetry, or nonfiction, and I frequently mistook them for the same audiobook. Itβs too similar. Too flat. It will be some time before AI technology can do what human narrators do, but I believe that itβs gradually improving.
These AI audiobooks arenβt perfect, but I believe that they will benefit those who canβt afford to buy audiobooks, which are extremely expensive. Theyβre often more than twice the price of a paperback, so some of the titles in Project Gutenbergβs catalog may be of help. There are libraries that offer audiobooks both online and offline, and some retailers offer discounts as well, so if titles are not available there, listeners can opt for these AI audiobooks instead.
For the publisherβs part, these AI audiobooks wonβt be much of a help, either. Because Audibleβs audiobook self-publishing platform, ACX, doesnβt accept βtext-to-speech or other automated recordings,β these AI-produced audiobooks will not be available on Audible anytime soon. Iβm assuming that the same requirements apply to traditional publishers as well. However, Amazonβs self-publishing platform, Kindle Direct Publishing, took a sharp turn in November when it announced that it would beta-test a feature that produces AI audiobooks from print books.
Although AI may pose a threat to the publishing industry, especially to narrators, it has proven to be beneficial to disabled people, such as Book Riot Contributing Editor Kendra Winchester, who writes about audiobooks and disability literature.
βFor disabled people to truly have the access to books that we deserve, the audiobooks available shouldnβt be stripped of all of the humanity that narrators bring to their performancesβ
For Winchester, AI narration could prove useful in other ways. As someone who already uses Appleβs screen reader app on her phone, using AI narration technology to create a better screen reader could prove beneficial. Still, disabled people deserve more than flat, emotionless AI audiobooks. βFor disabled people to truly have the access to books that we deserve, the audiobooks available shouldnβt be stripped of all of the humanity that narrators bring to their performances,β she wrote.
Bert Baxter, a member of the Deaf community, heavily relies on audiobooks for accessing written content. He said that the emergence of AI audiobooks has brought an exciting potential to enhance the Deaf communityβs reading experience. Although he believes that AI audiobooks have the potential to greatly improve accessibility for Deaf people, he emphasizes the importance of AI audiobooks being produced with accessibility in mind, including support for different reading speeds and navigation options.
What Does This Mean for the Audiobook Industry?
These AI audiobooks appear impressive at first listen, but weβre actually still a long way from widespread adoption of AI in audiobook production.
βFor now, these options are mainly being considered by self-publishing authors and academic publishers β or publishers that simply donβt have the resources to handle audiobook production,β publishing consultant Jane Friedman said when I asked her about the subject earlier this year. βWhile human narrators may feel threatened by this, I havenβt seen AI replacing jobs that would today be done by human narrators. It could happen in the future, especially if popular narrators license their voices for use.β
But given how quickly technology advances, how long will human narrators have before AI narrators βcatch upβ?
βAI narrators have already caught up to human narrators in the wild,β said Sil Hamilton, a Language Model Researcher at McGill University.
Project Gutenberg is not the only organization using AI narrators to produce audiobooks; Apple has been doing so for at least the past nine months. Called digital narration, it allows publishers to produce audiobooks out of their ebooks. Apple Books competes with Amazonβs Kindle Direct Publishing, which is the most popular self-publishing platform. Hamilton told me that because KDP doesnβt allow AI narrators, itβs possible that they donβt allow digital narrators to differentiate themselves and that many audiobook narrators were shocked by what Apple did. Apple, like Project Gutenberg, may require AI narrators to bridge the gap, he said:
ββ¦perhaps a great AI narrator needs to understand the human condition before they perfectly mimic us.β
βHowever, whether their use in the wild determines whether AI narrators have βcaught upβ to human narrators is only one heuristic,β Hamilton continued.
He explained that diffusion models, language models, and other predictive or generative deep learning algorithms all function by developing an understanding of their input dataβ¦While larger models can create more sophisticated representations of their data domain, theyβre increasingly reaching computational limits. βThe human voice exists in a narrow frequency range centered around 4000Hz, but as you suggest voice modifiers like intonation, implication, etc., all depend on the mind; not the voice β perhaps a great AI narrator needs to understand the human condition before they perfectly mimic us,β Hamilton clarified. βBut whether that is required to automate away narratorsβ jobs is unfortunately another question.β
These AI-produced audiobooks are yet another chapter in the saga of AI eroding human creativity. I hope it gets regulated in the future because producing audiobooks on such a large scale may crumble the industry.
These AI voices will definitely improve over time, so there must be safeguards in place when using them.





