Leading Authors of Today's Magazine
  • Home
  • Editorial
  • Featured New Authors
  • Anthologies
    • Moguls Unleashed
      • Dr. Dashnay Holmes is a Dynamic Entrepreneur!
      • Dr. Jane Mukami
      • Dr. Demaryl Roberts-Singleton
      • Dr. Desirie Sykes
      • Dr. Terry Golightly
      • Dr. Shontae Davidson
      • Dr. Adrienne Velazquez
      • Dr. Nichole Pettway
      • Dr. Daniela Peel: Corporate Wellness
  • News and Updates
  • More
    • Multimedia
    • Author of the Month
    • Book Reviews
    • Interviews and Conversations
    • Community and Engagement
    • Writing Resources
    • Genre Explorations
No Result
View All Result
Leading Authors Of Today's Magazine
No Result
View All Result

Erotica, Atwood, and ‘For Dummies’: The Books Behind Meta’s Generative AI

May 27, 2024
in How-to
0
Home How-to
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter
Erotica, Atwood, and ‘For Dummies’: The Books Behind Meta’s Generative AI


Editor’s note: This article is part of The Atlantic’s series on Books3. You can search the database for yourself here, and read about its origins here.

This summer, I reported on a data set of more than 191,000 books that were used without permission to train generative-AI systems by Meta, Bloomberg, and others. “Books3,” as it’s called, was based on a collection of pirated ebooks that includes travel guides, self-published erotic fiction, novels by Stephen King and Margaret Atwood, and a lot more. It is now at the center of several lawsuits brought against Meta by writers who claim that its use amounts to copyright infringement.

Books play a crucial role in the training of generative-AI systems. Their long, thematically consistent paragraphs provide information about how to construct long, thematically consistent paragraphs—something that’s essential to creating the illusion of intelligence. Consequently, tech companies use huge data sets of books, typically without permission, purchase, or licensing. (Lawyers for Meta argued in a recent court filing that neither outputs from the company’s generative AI nor the model itself are “substantially similar” to existing books.)

In its training process, a generative-AI system essentially builds a giant map of English words—the distance between two words correlates with how often they appear near each other in the training text. The final system, known as a large language model, will produce more plausible responses for subjects that appear more often in its training text. (For further details on this process, you can read about transformer architecture, the innovation that precipitated the boom in large language models such as LLaMA and ChatGPT.) A system trained primarily on the Western canon, for example, will produce poor answers to questions about Eastern literature. This is just one reason it’s important to understand the training data used by these models, and why it’s troubling that there is generally so little transparency.

With that in mind, here are some of the most represented authors in Books3, with the approximate number of entries contributed:

Although 24 of the 25 authors listed here are fiction writers (the lone exception is Betty Crocker), the data set is two-thirds nonfiction overall. It includes several thousand technical manuals; more than 1,500 books from Christian publishers (including at least 175 Bibles and Bible commentaries); more than 400 Dungeons & Dragons– and Magic the Gathering–themed books; and 46 titles by Charles Bukowski. Nearly every subject imaginable is covered (including How to Housebreak Your Dog in 7 Days), but the collection skews heavily toward the interests and perspectives of the English-speaking Western world.

Many people have written about bias in AI systems. An AI-based face-recognition program, for example, that’s trained disproportionately on images of light-skinned people might work less well on images of people with darker skin—with potentially disastrous outcomes. Books3 helps us see the problem from another angle: What combination of books would be unbiased? What would be an equitable distribution of Christian, Muslim, Buddhist, and Jewish subjects? Are extremist views balanced by moderate ones? What’s the proper ratio of American history to Chinese history, and what perspectives should be represented within each? When knowledge is organized and filtered by algorithm rather than by human judgment, the problem of perspective becomes both crucial and intractable.


Books3 is a gigantic dataset. Here are just a few different ways to consider the authors, books, and publishers contained within. Note that the samples presented here are not comprehensive; they are chosen to give a quick sense of the many different types of writing used to train generative AI. As above, book counts may include multiple editions.


As AI chatbots begin to replace traditional search engines, the tech industry’s power to constrain our access to information and manipulate our perspective increases exponentially. If the internet democratized access to information by eliminating the need to go to a library or consult an expert, the AI chatbot is a return to the old gatekeeping model, but with a gatekeeper that’s opaque and unaccountable—a gatekeeper, moreover, that is prone to “hallucinations” and might or might not cite sources.

In its recent court filing—a motion to dismiss the lawsuit brought by the authors Richard Kadrey, Sarah Silverman, and Christopher Golden—Meta observed that “Books3 comprises an astonishingly small portion of the total text used to train LLaMA.” This is technically true (I estimate that Books3 is about 3 percent of LLaMA’s total training text) but sidesteps a core concern: If LLaMA can summarize Silverman’s book, then it likely relies heavily on the text of her book to do so. In general, it’s hard to know how much any given source contributes to a generative-AI system’s output, given the impenetrability of current algorithms.

Still, our only clue to the kinds of information and opinions AI chatbots will dispense is their training data. A look at Books3 is a good start, but it’s just one corner of the training-data universe, most of which remains behind closed doors.



Read More

Previous Post

Author Steven Hyden on his new Bruce Springsteen book

Next Post

Five things to remember when writing for children

Next Post
Five things to remember when writing for children

Five things to remember when writing for children

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Random News

diy BTS book #snooky

diy BTS book #snooky

...

Google Docs: The Best Book Writing Software for Writers?

Google Docs: The Best Book Writing Software for Writers?

...

Sudha Murthy on her upbringing and love for books #sudhamurthy #interview

Sudha Murthy on her upbringing and love for books #sudhamurthy #interview

...

Sebastian Junger talks near-death experience in new book “In My Time of Dying” (Video)

Sebastian Junger talks near-death experience in new book “In My Time of Dying” (Video)

...

tWitch’s Mother Speaks Out For The First Time Since Allison Holker’s Interview Announcing Her Book

tWitch’s Mother Speaks Out For The First Time Since Allison Holker’s Interview Announcing Her Book

...

Celebrities Who Have Launched Their Own Book Clubs

Celebrities Who Have Launched Their Own Book Clubs

...

About us

Today's Author Magazine

Welcome to Today's Author Magazine, the go-to destination for discovering fresh talent in the literary world. We shine a light on new authors and captivating anthologies, providing readers with a diverse array of stories and insights. Here's a look at the vibrant categories that make up our magazine

RecentNews

Bishop Funke Adejumo: Writing Her Legacy Into Nations

Elevating Leadership, Empowering Women: The Journey of Dr. Janet Lockhart-Jones

Leading with Words: The Transformational Journey of Dr. Mark Holland

Faith, Healing, and Resilience: The Empowering Voice of Elaine King

Categories

  • Anthologies
  • Author of the Month
  • Book Reviews
  • Community and Engagement
  • Editorial
  • Featured
  • Featured New Authors
  • Genre Explorations
  • Global Influence
  • How-to
  • Interviews and Conversations
  • Multimedia
  • News and Updates
  • Other
  • Uncategorized
  • Writing Resources

RandomNews

PW Talks with Rachel Khong

Falling For You Book Trailer | Romance Novel | Authortube

Poetry’s surprising renaissance in the UK

Inaugural Himalayan Literature Festival concludes

2054 — a sci-fi novel on the age of ‘non-order’

  • Home
  • About
  • Privacy
  • Terms
  • Contact

© 2024 Today's Author Magazine. All Rights Are Reserved.

No Result
View All Result
  • About
  • Contact
  • Home
  • Moguls Unleashed
  • Privacy
  • Terms

© 2024 Today's Author Magazine. All Rights Are Reserved.