Search a title or topic

Over 20 million podcasts, powered by 

Player FM logo
Artwork

Content provided by CCC media team. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by CCC media team or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.
Player FM - Podcast App
Go offline with the Player FM app!

Anonymization of sensitive information in financial documents (sps25)

31:17
 
Share
 

Manage episode 514675686 series 2475293
Content provided by CCC media team. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by CCC media team or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.
Data is the fossil fuel of the machine learning world, essential for developing high quality models but in limited supply. Yet institutions handling sensitive documents — such as financial, medical, or legal records often cannot fully leverage their own data due to stringent privacy, compliance, and security requirements, making training high quality models difficult. A promising solution is to replace the personally identifiable information (PII) with realistic synthetic stand-ins, whilst leaving the rest of the document in tact. In this talk, we will discuss the use of open source tools and models that can be self hosted to anonymize documents. We will go over the various approaches for Named Entity Recognition (NER) to identify sensitive entities and the use of diffusion models to inpaint anonymized content. about this event: https://talks.python-summit.ch/sps25/talk/EWMBRM/
  continue reading

2014 episodes

Artwork
iconShare
 
Manage episode 514675686 series 2475293
Content provided by CCC media team. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by CCC media team or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.
Data is the fossil fuel of the machine learning world, essential for developing high quality models but in limited supply. Yet institutions handling sensitive documents — such as financial, medical, or legal records often cannot fully leverage their own data due to stringent privacy, compliance, and security requirements, making training high quality models difficult. A promising solution is to replace the personally identifiable information (PII) with realistic synthetic stand-ins, whilst leaving the rest of the document in tact. In this talk, we will discuss the use of open source tools and models that can be self hosted to anonymize documents. We will go over the various approaches for Named Entity Recognition (NER) to identify sensitive entities and the use of diffusion models to inpaint anonymized content. about this event: https://talks.python-summit.ch/sps25/talk/EWMBRM/
  continue reading

2014 episodes

すべてのエピソード

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright
Listen to this show while you explore
Play