Search a title or topic

Over 20 million podcasts, powered by 

Player FM logo
Artwork

Content provided by S&P Global Market Intelligence and P Global Market Intelligence. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by S&P Global Market Intelligence and P Global Market Intelligence or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.
Player FM - Podcast App
Go offline with the Player FM app!

Ethical AI Data

26:51
 
Share
 

Manage episode 466165114 series 2877784
Content provided by S&P Global Market Intelligence and P Global Market Intelligence. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by S&P Global Market Intelligence and P Global Market Intelligence or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

Ethical concerns about the use of AI have to start with training data. Too often, the primary concern is simply generating sufficient data, rather than understanding its nature. Emily Jasper and Abby Simmons are back to continue the conversation started in episode 198 with host Eric Hanselman. With generative AI, the data is the application in its most formative sense. Unlike traditional application development, where the expectation is that functionality will be expanded in later releases, GenAI applications require careful design of training data before training takes place. The perspectives contained in data age rapidly and model training doesn’t differentiate between outdated and current indications. Old data can effectively poison model outputs. Businesses risk alienating customers with models that are trained with data that don’t properly represent them. This is particularly true with marginalized communities, where language and context can change over shorter time frames.

While there is research work on model retraining, work in AI today has to focus on effective data quality management. DeepSeek is causing a significant rethinking. Human data cleansing can be effective, but can’t scale to AI demands. Data workbench tools and synthetic data approaches can help, but better automation is needed to ensure that data sets are truly representative. Data collection and data sourcing need much greater attention to ensure that model results can engage the target audience and not be a liability. It’s a fundamental question of accountability that requires thinking in ways that are different than legacy development processes.

Mentioned in this episode:

More S&P Global Content:

Credits:

  continue reading

102 episodes

Artwork

Ethical AI Data

Next in Tech

13 subscribers

published

iconShare
 
Manage episode 466165114 series 2877784
Content provided by S&P Global Market Intelligence and P Global Market Intelligence. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by S&P Global Market Intelligence and P Global Market Intelligence or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

Ethical concerns about the use of AI have to start with training data. Too often, the primary concern is simply generating sufficient data, rather than understanding its nature. Emily Jasper and Abby Simmons are back to continue the conversation started in episode 198 with host Eric Hanselman. With generative AI, the data is the application in its most formative sense. Unlike traditional application development, where the expectation is that functionality will be expanded in later releases, GenAI applications require careful design of training data before training takes place. The perspectives contained in data age rapidly and model training doesn’t differentiate between outdated and current indications. Old data can effectively poison model outputs. Businesses risk alienating customers with models that are trained with data that don’t properly represent them. This is particularly true with marginalized communities, where language and context can change over shorter time frames.

While there is research work on model retraining, work in AI today has to focus on effective data quality management. DeepSeek is causing a significant rethinking. Human data cleansing can be effective, but can’t scale to AI demands. Data workbench tools and synthetic data approaches can help, but better automation is needed to ensure that data sets are truly representative. Data collection and data sourcing need much greater attention to ensure that model results can engage the target audience and not be a liability. It’s a fundamental question of accountability that requires thinking in ways that are different than legacy development processes.

Mentioned in this episode:

More S&P Global Content:

Credits:

  continue reading

102 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Listen to this show while you explore
Play