Search a title or topic

Over 20 million podcasts, powered by 

Player FM logo
Artwork

Content provided by datasciencehappywarriors. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by datasciencehappywarriors or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.
Player FM - Podcast App
Go offline with the Player FM app!

Episode 2: Data Wrangling: Why you gotta do what you gotta do

12:43
 
Share
 

Manage episode 311961988 series 3206208
Content provided by datasciencehappywarriors. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by datasciencehappywarriors or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

The common complaint about data science is that 90% of your time is spent data wrangling. In this episode, I talk about some history that leads to this current state of data science work, and why you should embrace this. I also give some resources that will help you with your data wrangling at the raw level.

R Packages and Tools mentioned in this episode:

R:

Package Description lubridate Handing dates, datetimes, intervals, durations readr Reading in CSV and related textual files readxl Reading in Excel files jsonlite Reading, writing and manipulating JSON structures httr Reading HTML and extracting parts programatically dplyr + purr Simple grammar for common data manipulations

Command line tools:

Utility Description head Show first few lines of a text file less [-S] Pager to make sure data you look at doesn't scroll off the screen wc Count lines, words, and characters in a file csvlook Python package that helps format and manipulate CSV files from command line

  continue reading

5 episodes

Artwork
iconShare
 
Manage episode 311961988 series 3206208
Content provided by datasciencehappywarriors. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by datasciencehappywarriors or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

The common complaint about data science is that 90% of your time is spent data wrangling. In this episode, I talk about some history that leads to this current state of data science work, and why you should embrace this. I also give some resources that will help you with your data wrangling at the raw level.

R Packages and Tools mentioned in this episode:

R:

Package Description lubridate Handing dates, datetimes, intervals, durations readr Reading in CSV and related textual files readxl Reading in Excel files jsonlite Reading, writing and manipulating JSON structures httr Reading HTML and extracting parts programatically dplyr + purr Simple grammar for common data manipulations

Command line tools:

Utility Description head Show first few lines of a text file less [-S] Pager to make sure data you look at doesn't scroll off the screen wc Count lines, words, and characters in a file csvlook Python package that helps format and manipulate CSV files from command line

  continue reading

5 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright
Listen to this show while you explore
Play