BELOW SUPERNAV drop zone ⇩

Major tech firms trained AI using stolen YouTube content: Report

  • Subtitles from 173,536 YouTube videos ripped from the site
  • Tech giants Apple, Anthropic, Nvidia, Salesforce used data to train AI
  • Content creators, YouTube guidelines staunchly against the practice
FILE - The YouTube app is displayed on an iPad in Baltimore. YouTube has blocked access to videos of a protest song in Hong Kong, days after court approved an injunction banning the song in the city. (AP Photo/Patrick Semansky, File)

FILE – The YouTube app is displayed on an iPad in Baltimore. YouTube has blocked access to videos of a protest song in Hong Kong, days after court approved an injunction banning the song in the city. (AP Photo/Patrick Semansky, File)

MAIN AREA TOP drop zone ⇩

ovp test

mLife Diagnostics LLC: Oral Fluid Drug Testing

Male shot by female at Shreveport apartment

Class to create biodiverse backyard

Rules for outbursts at Caddo School Board Meeting

Tech giants Apple, Anthropic, Nvidia and Salesforce pilfered data from tens of thousands of content creators on YouTube to train AI.

An investigation by Proof News, copublished with Wired, found that the Silicon Valley luminaries fed subtitles from 173,536 YouTube videos across more than 48,000 channels to AI programs.

The “YouTube Subtitles” dataset was allegedly ripped from the platform and its users without permission, an act that violates YouTube’s guidelines.

“No one came to me and said, ‘We would like to use this,’” David Pakman, host of “The David Pakman Show,” told Proof News.


Best Prime Day Deals for 2024:

Products still on sale after Prime Big Deal Days

Beats headphones and tablets remain marked down

Make sure you’re stocked with Apple products

BestReviews is reader-supported and may earn an affiliate commission.


“This is my livelihood, and I put time, resources, money and staff time into creating this content,” Pakman added. “There’s really no shortage of work.” 

“Apple has sourced data for their AI from several companies,” Marques Brownlee, known by his handle MKBHD, shared on X. “One of them scraped tons of data/transcripts from YouTube videos, including mine. Apple technically avoids ‘fault’ here because they’re not the ones scraping. But this is going to be an evolving problem for a long time.”

YouTubers aren’t the only ones whose content was used for the dataset. The report also found transcripts from online learning resources, including Khan Academy, Harvard and MIT videos, as well as news media outlets such as NPR, BBC, ““The Late Show With Stephen Colbert” and “Last Week Tonight With John Oliver.”

According to research published by EleutherAI, the “YouTube Subtitles” dataset is part of a larger body of language modeling data called the Pile, which mega companies dialed in to for training their AI.

Click here to access a tool that shows which YouTube channels and videos were siphoned into the dataset.

AI

Copyright 2024 Nexstar Media Inc. All rights reserved. This material may not be published, broadcast, rewritten, or redistributed

MAIN AREA MIDDLE drop zone ⇩

Trending on NewsNation

MAIN AREA BOTTOM drop zone ⇩

tt

KC Chiefs parade shooting: 1 dead, 21 shot including 9 kids | Morning in America

Witness of Chiefs parade shooting describes suspect | Banfield

Kansas City Chiefs parade shooting: Mom of 2 dead, over 20 shot | Banfield

WWE star Ashley Massaro 'threatened' by board to keep quiet about alleged rape: Friend | Banfield

Friend of WWE star: Ashley Massaro 'spent hours' sobbing after alleged rape | Banfield

Clear

la

56°F Clear Feels like 55°
Wind
5 mph N
Humidity
51%
Sunrise
Sunset

Tonight

Clear early followed by cloudy skies overnight. Low 49F. Winds light and variable.
49°F Clear early followed by cloudy skies overnight. Low 49F. Winds light and variable.
Wind
5 mph NNE
Precip
1%
Sunset
Moon Phase
Waning Crescent