Billion Dollar Companies Like Apple And Nvidia Are Swiping YouTube Content To Train Their AI

Apple, Nvidia and Salesforce are using content on YouTube to train their AI.

Subtitles from 173,536 YouTube videos spread across 48,000 YouTube channels were used by these companies as training data despite YouTube’s rules against harvesting information, according to Proof News and Wired.

The dataset – called YouTube Subtitles – includes transcripts from educational channels like Khan Academy, MIT, and Harvard, as well as media outlets such as The Wall Street Journal, NPR, and the BBC.

Late-night shows like The Late Show, Last Week Tonight, and Jimmy Kimmel Live were also used, thge report says.

Additionally, Proof News found that popular YouTubers like MrBeast, Marques Brownlee, Jacksepticeye, and PewDiePie had their videos included.

David Pakman, host of The David Pakman Show, which sports more than 2 million subscribers and more than 2 billion views, commented: “No one came to me and said, ‘We would like to use this.”

“This is my livelihood, and I put time, resources, money, and staff time into creating this content. There’s really no shortage of work,” he added, arguing that if AI companies are paid, he should be compensated for his data.

Dave Wiskus, the CEO of Nebula, didn’t mince words: “It’s theft. Will this be used to exploit and harm artists? Yes, absolutely.”

The data was part of ‘The Pile’, a compilation of data released that includes content from YouTube, the European Parliament, English Wikipedia and corporate emails.

Apple utilized the Pile for OpenELM before adding new AI features to its products. Bloomberg and Databricks also leveraged the Pile, according to their publications. Anthropic, an AI company backed by a $4 billion Amazon investment, confirmed its use of the Pile for its AI assistant, Claude, while emphasizing compliance with YouTube’s terms, Wired wrote.

Salesforce used the Pile for an AI model intended for academic and research purposes, releasing it publicly in 2022. This model has been downloaded over 86,000 times.

Litigation against companies using unauthorized data for AI training is ongoing. Authors have sued over the use of works in datasets like Books3, another Pile component. Tech companies argue their actions fall under fair use, but legal battles are ongoing.

Read Wired’s full story here.

Tyler Durden
Wed, 07/17/2024 – 16:40

Please wait...

USER PINNED

42 Attorneys General Demand Surgeon General Warnings On Social Media

Musk Announces X To Sue ‘Perpetrators And Collaborators’ Behind Advertising Censorship Cartel

‘Boots On The Ground’ In The World’s Bitcoin Paradise

Multiple US Presidents Have Admitted the US “Government” Is Run By Inter-Generational Organized Crime

More Citizens Trust Trump Over Biden To Protect Democracy

Tractor Supply Nukes DEI To Prevent Itself From Being ‘Bud Light’d’ By Conservatives

‘No Physical Harm To Anyone By Leaks’: Assange’s Freedom A Huge Blow To Detractors

Dr. Peter McCullough: Bill Gates donated $9.5 million to create H5N1 mutations that could infect humans

Assange To Be Freed: DoJ Agrees ‘Time Served’ Plea Deal With WikiLeaks Founder

Hedges: The Slow-Motion Execution Of Julian Assange Continues

The WHO’s Pandemic Treaty And A Bird Flu Crisis Are Both Arriving At The Same Time

What Americans Are Most Worried About

The Income A Family Needs To Live Comfortably In Every US State

The Great Pandemic Walkback

From COVID To Campus Protests: How The Police-State Muzzles Free-Speech

Likelihood of DEATH from Covid mRNA jabs a hundred times greater than from flu vaccines, research reveals

Episode 449 – How BlackRock Conquered the World

Ex-CDC Director Says It’s High Time To Admit ‘Significant Side Effects’ Of COVID-19 Vaccines

Billion Dollar Companies Like Apple And Nvidia Are Swiping YouTube Content To Train Their AI

Author:

Top Posts

Recent Comments

Rank

My Balance

notifications