Tech

Unauthorized YouTube Dataset Used for Artificial Intelligence Models Revealed

The YouTube dataset used for artificial intelligence models has been shared without permission, raising concerns about privacy. Details in our news.

Published

on

Proof News revealed in a new review that some of the world’s leading technology companies have used subtitles from more than 173,000 YouTube videos without permission to train AI models.

A data set created by a non-profit organization called EleutherAI contained subtitles of over 48,000 YouTube videos from various channels. This incident affected many companies including Apple, NVIDIA, and Anthropic. The research results once again highlighted that AI technology is often developed with data obtained from creators without permission or compensation.

Although the data set did not include any videos or visuals from YouTube, it included some of the platform’s biggest content creators (such as Marques Brownlee and MrBeast) and major news publishers like New York Times, BBC, and ABC News. Google had previously stated that using YouTube data for AI training could violate the platform’s terms of service, and a Google spokesperson confirmed that this situation is still valid.

However, Apple, NVIDIA, Anthropic, or EleutherAI have not made an official statement on the matter so far. AI companies are often criticized for not being transparent about the source of the data used to train their models. While artists are uncomfortable with this situation, companies are avoiding answering questions.

Especially platforms like YouTube, which are among the world’s largest video platforms, have become massive data pools containing texts, sounds, videos, and images. However, as Alphabet CEO Sundar Pichai pointed out, using YouTube data to train their AI models would constitute a serious violation.

Leave a Reply

Your email address will not be published. Required fields are marked *

Trending

Exit mobile version