How to train your data | The Vergecast

Training data is the raw material of the AI industry. Claude, ChatGPT, Gemini, and the rest are built on top of oceans of stuff. What is that stuff? Books. Blog posts. YouTube videos. Reddit comments. All of it and more, in virtually incomprehensible quantities. Alex Reisner, a staff writer at The Atlantic who has been investigating training data, explains how AI companies get all this data, why they'd really prefer you not know what's in it, and whether training data could ever be a fair trade.

00:00 Intro
01:02 90 Seconds on The Verge
03:18 Why Training Data Matters
08:43 Common Crawl and Filtering
11:51 Academia and Data Laundering
15:37 YouTube as Data Mine
20:01 Synthetic Data Myth
21:59 Paying Creators for Data
23:13 Wrap Up and Credits

Subscribe: http://goo.gl/G5RXGs
Like The Verge on Facebook: https://goo.gl/2P1aGc
Follow on Twitter: https://goo.gl/XTWX61
Follow on Instagram: https://goo.gl/7ZeLv

Watch The Vergecast on YouTube: https://bit.ly/40RFRkg
The Vergecast Podcast: https://bit.ly/3WQDexZ
Decoder with Nilay Patel: http://apple.co/3v29nDc
More about our podcasts: https://www.theverge.com/podcasts

Read More: http://www.theverge.com
Community guidelines: http://bit.ly/2D0hlAv
Wallpapers from The Verge: https://bit.ly/2xQXYJr
Shop our Verge merch store here: https://bit.ly/4kPCmEc

Subscribe to The Verge: https://bit.ly/3FT6n5S

If you buy something from a Verge link, Vox Media may receive a commission without exerting any influence on editorial content. For more information about our ethics policy, visit: https://bit.ly/3ZWTlLs Receive SMS online on sms24.me

TubeReader video aggregator is a website that collects and organizes online videos from the YouTube source. Video aggregation is done for different purposes, and TubeReader take different approaches to achieve their purpose.

Our try to collect videos of high quality or interest for visitors to view; the collection may be made by editors or may be based on community votes.

Another method is to base the collection on those videos most viewed, either at the aggregator site or at various popular video hosting sites.

TubeReader site exists to allow users to collect their own sets of videos, for personal use as well as for browsing and viewing by others; TubeReader can develop online communities around video sharing.

Our site allow users to create a personalized video playlist, for personal use as well as for browsing and viewing by others.

@YouTubeReaderBot allows you to subscribe to Youtube channels.

By using @YouTubeReaderBot Bot you agree with YouTube Terms of Service.

Use the @YouTubeReaderBot telegram bot to be the first to be notified when new videos are released on your favorite channels.

Look for new videos or channels and share them with your friends.

You can start using our bot from this video, subscribe now to How to train your data | The Vergecast

What is YouTube?

YouTube is a free video sharing website that makes it easy to watch online videos. You can even create and upload your own videos to share with others. Originally created in 2005, YouTube is now one of the most popular sites on the Web, with visitors watching around 6 billion hours of video every month.