Search a title or topic

Over 20 million podcasts, powered by 

Player FM logo
Artwork

Content provided by HPR Volunteer and Hacker Public Radio. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by HPR Volunteer and Hacker Public Radio or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.
Player FM - Podcast App
Go offline with the Player FM app!

HPR4404: Kevie nerd snipes Ken by grepping xml

 
Share
 

Manage episode 489591309 series 108988
Content provided by HPR Volunteer and Hacker Public Radio. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by HPR Volunteer and Hacker Public Radio or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.
More Command line fun: downloading a podcast In the show hpr4398 :: Command line fun: downloading a podcast Kevie walked us through a command to download a podcast. He used some techniques here that I hadn't used before, and it's always great to see how other people approach the problem. Let's have a look at the script and walk through what it does, then we'll have a look at some "traps for young players" as the EEVBlog is fond of saying. Analysis of the Script wget `curl https://tuxjam.otherside.network/feed/podcast/ | grep -o 'https*://[^"]*ogg' | head -1` It chains four different commands together to "Save the latest file from a feed". Let's break it down so we can have checkpoints between each step. I often do this when writing a complex one liner - first do it as steps, and then combine it. The curl command gets https://tuxjam.otherside.network/feed/podcast/ . To do this ourselves we will call curl https://tuxjam.otherside.network/feed/podcast/ --output tuxjam.xml , as the default file name is index.html. This gives us a xml file, and we can confirm it's valid xml with the xmllint command. $ xmllint --format tuxjam.xml >/dev/null $ echo $? 0 Here the output of the command is ignored by redirecting it to /dev/null Then we check the error code the last command had. As it's 0 it completed sucessfully. Kevie then passes the output to the grep search command with the option -o and then looks for any string starting with https followed by anything then followed by two forward slashes, then -o, --only-matching Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line We can do the same with. I was not aware that grep defaulted to regex, as I tend to add the --perl-regexp to explicitly add it. grep --only-matching 'https*://[^"]*ogg' tuxjam.xml http matches the characters http literally (case sensitive) s* matches the character s literally (case sensitive) Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy] : matches the character : literally / matches the character / literally / matches the character / literally [^"]* match a single character not present in the list below Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy] " a single character in the list " literally (case sensitive) ogg matches the characters ogg literally (case sensitive) When we run this ourselves we get the following $ grep --only-matching 'https*://[^"]*ogg' tuxjam.xml https://archive.org/download/tuxjam-121/tuxjam_121.ogg https://archive.org/download/tuxjam-120/TuxJam_120.ogg https://archive.org/download/tux-jam-119/TuxJam_119.ogg https://archive.org/download/tuxjam_118/tuxjam_118.ogg https://archive.org/download/tux-jam-117-uncut/TuxJam_117.ogg https://tuxjam.otherside.network/tuxjam-115-ogg https://archive.org/download/tuxjam_116/tuxjam_116.ogg https://tuxjam.otherside.network/tuxjam-115-ogg https://tuxjam.otherside.network/tuxjam-115-ogg https://t
  continue reading

4549 episodes

Artwork
iconShare
 
Manage episode 489591309 series 108988
Content provided by HPR Volunteer and Hacker Public Radio. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by HPR Volunteer and Hacker Public Radio or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.
More Command line fun: downloading a podcast In the show hpr4398 :: Command line fun: downloading a podcast Kevie walked us through a command to download a podcast. He used some techniques here that I hadn't used before, and it's always great to see how other people approach the problem. Let's have a look at the script and walk through what it does, then we'll have a look at some "traps for young players" as the EEVBlog is fond of saying. Analysis of the Script wget `curl https://tuxjam.otherside.network/feed/podcast/ | grep -o 'https*://[^"]*ogg' | head -1` It chains four different commands together to "Save the latest file from a feed". Let's break it down so we can have checkpoints between each step. I often do this when writing a complex one liner - first do it as steps, and then combine it. The curl command gets https://tuxjam.otherside.network/feed/podcast/ . To do this ourselves we will call curl https://tuxjam.otherside.network/feed/podcast/ --output tuxjam.xml , as the default file name is index.html. This gives us a xml file, and we can confirm it's valid xml with the xmllint command. $ xmllint --format tuxjam.xml >/dev/null $ echo $? 0 Here the output of the command is ignored by redirecting it to /dev/null Then we check the error code the last command had. As it's 0 it completed sucessfully. Kevie then passes the output to the grep search command with the option -o and then looks for any string starting with https followed by anything then followed by two forward slashes, then -o, --only-matching Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line We can do the same with. I was not aware that grep defaulted to regex, as I tend to add the --perl-regexp to explicitly add it. grep --only-matching 'https*://[^"]*ogg' tuxjam.xml http matches the characters http literally (case sensitive) s* matches the character s literally (case sensitive) Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy] : matches the character : literally / matches the character / literally / matches the character / literally [^"]* match a single character not present in the list below Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy] " a single character in the list " literally (case sensitive) ogg matches the characters ogg literally (case sensitive) When we run this ourselves we get the following $ grep --only-matching 'https*://[^"]*ogg' tuxjam.xml https://archive.org/download/tuxjam-121/tuxjam_121.ogg https://archive.org/download/tuxjam-120/TuxJam_120.ogg https://archive.org/download/tux-jam-119/TuxJam_119.ogg https://archive.org/download/tuxjam_118/tuxjam_118.ogg https://archive.org/download/tux-jam-117-uncut/TuxJam_117.ogg https://tuxjam.otherside.network/tuxjam-115-ogg https://archive.org/download/tuxjam_116/tuxjam_116.ogg https://tuxjam.otherside.network/tuxjam-115-ogg https://tuxjam.otherside.network/tuxjam-115-ogg https://t
  continue reading

4549 episodes

सभी एपिसोड

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright
Listen to this show while you explore
Play