Subscribe now
Don’t miss out on the upcoming videos - sign up today and fuel your AI knowledge.
AI Data Chats | Series: Practitioners
In this episode of AI Data Chats, we are joined by Shayne Longpre from MIT and lead of the Data Provenance Initiative. Shayne and his team have audited thousands of pre-training datasets to trace the access and reuse of multimodal datasets across the web. In our chat, Shayne shares his thoughts on the importance of understanding the origins of the “organic” human data that seeds the generation of the synthetic, generated data proliferating across emerging datasets. He also shares results from the Data Provenance Initiative’s audits that shows the sources for video and audio datasets are much more concentrated in a handful of platforms as opposed to text datasets, and what the implications may then be for limits to data access by researchers and civil society to video and audio data.
Solutions Engineer @ Encord
MIT, lead of the Data Provenance Initiative
Watch Encord’s ML Solutions Engineer Jennifer Ding interview key thought leaders in the AI data space.
Explore the future of AI through expert-led conversations on data, deep learning, and real-world impact.
Subscribe now
Don’t miss out on the upcoming videos - sign up today and fuel your AI knowledge.