Search a title or topic

Over 20 million podcasts, powered by 

Player FM logo
Artwork

Content provided by Brook Perry. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Brook Perry or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.
Player FM - Podcast App
Go offline with the Player FM app!

Running data-driven evaluations of AI engineering tools

37:35
 
Share
 

Manage episode 523877919 series 3338504
Content provided by Brook Perry. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Brook Perry or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

AI engineering tools are evolving fast. New coding assistants, debugging agents, and automation platforms emerge every month. Engineering leaders want to take advantage of these innovations while avoiding costly experiments that create more distraction than impact.

In this episode of the Engineering Enablement podcast, host Laura Tacho and Abi Noda outline a practical model for evaluating AI tools with data. They explain how to shortlist tools by use case, run trials that mirror real development work, select representative cohorts, and ensure consistent support and enablement. They also highlight why baselines and frameworks like DX’s Core 4 and the AI Measurement Framework are essential for measuring impact.

Where to find Laura Tacho:

• LinkedIn: https://www.linkedin.com/in/lauratacho/

• X: https://x.com/rhein_wein

• Website: https://lauratacho.com/

• Laura’s course (Measuring Engineering Performance and AI Impact): https://lauratacho.com/developer-productivity-metrics-course

Where to find Abi Noda:

• LinkedIn: https://www.linkedin.com/in/abinoda

• Substack: ​​https://substack.com/@abinoda

In this episode, we cover:

(00:00) Intro: Running a data-driven evaluation of AI tools

(02:36) Challenges in evaluating AI tools

(06:11) How often to reevaluate AI tools

(07:02) Incumbent tools vs challenger tools

(07:40) Why organizations need disciplined evaluations before rolling out tools

(09:28) How to size your tool shortlist based on developer population

(12:44) Why tools must be grouped by use case and interaction mode

(13:30) How to structure trials around a clear research question

(16:45) Best practices for selecting trial participants

(19:22) Why support and enablement are essential for success

(21:10) How to choose the right duration for evaluations

(22:52) How to measure impact using baselines and the AI Measurement Framework

(25:28) Key considerations for an AI tool evaluation

(28:52) Q&A: How reliable is self-reported time savings from AI tools?

(32:22) Q&A: Why not adopt multiple tools instead of choosing just one?

(33:27) Q&A: Tool performance differences and avoiding vendor lock-in

Referenced:

  continue reading

91 episodes

Artwork
iconShare
 
Manage episode 523877919 series 3338504
Content provided by Brook Perry. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Brook Perry or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

AI engineering tools are evolving fast. New coding assistants, debugging agents, and automation platforms emerge every month. Engineering leaders want to take advantage of these innovations while avoiding costly experiments that create more distraction than impact.

In this episode of the Engineering Enablement podcast, host Laura Tacho and Abi Noda outline a practical model for evaluating AI tools with data. They explain how to shortlist tools by use case, run trials that mirror real development work, select representative cohorts, and ensure consistent support and enablement. They also highlight why baselines and frameworks like DX’s Core 4 and the AI Measurement Framework are essential for measuring impact.

Where to find Laura Tacho:

• LinkedIn: https://www.linkedin.com/in/lauratacho/

• X: https://x.com/rhein_wein

• Website: https://lauratacho.com/

• Laura’s course (Measuring Engineering Performance and AI Impact): https://lauratacho.com/developer-productivity-metrics-course

Where to find Abi Noda:

• LinkedIn: https://www.linkedin.com/in/abinoda

• Substack: ​​https://substack.com/@abinoda

In this episode, we cover:

(00:00) Intro: Running a data-driven evaluation of AI tools

(02:36) Challenges in evaluating AI tools

(06:11) How often to reevaluate AI tools

(07:02) Incumbent tools vs challenger tools

(07:40) Why organizations need disciplined evaluations before rolling out tools

(09:28) How to size your tool shortlist based on developer population

(12:44) Why tools must be grouped by use case and interaction mode

(13:30) How to structure trials around a clear research question

(16:45) Best practices for selecting trial participants

(19:22) Why support and enablement are essential for success

(21:10) How to choose the right duration for evaluations

(22:52) How to measure impact using baselines and the AI Measurement Framework

(25:28) Key considerations for an AI tool evaluation

(28:52) Q&A: How reliable is self-reported time savings from AI tools?

(32:22) Q&A: Why not adopt multiple tools instead of choosing just one?

(33:27) Q&A: Tool performance differences and avoiding vendor lock-in

Referenced:

  continue reading

91 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright
Listen to this show while you explore
Play