HPR4404: Kevie nerd snipes Ken by grepping xml

Hacker Public Radio

Content provided by HPR Volunteer and Hacker Public Radio. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by HPR Volunteer and Hacker Public Radio or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

6M ago

MP3•Episode home

More Command line fun: downloading a podcast In the show hpr4398 :: Command line fun: downloading a podcast Kevie walked us through a command to download a podcast. He used some techniques here that I hadn't used before, and it's always great to see how other people approach the problem. Let's have a look at the script and walk through what it does, then we'll have a look at some "traps for young players" as the EEVBlog is fond of saying. Analysis of the Script wget `curl https://tuxjam.otherside.network/feed/podcast/ | grep -o 'https*://[^"]*ogg' | head -1` It chains four different commands together to "Save the latest file from a feed". Let's break it down so we can have checkpoints between each step. I often do this when writing a complex one liner - first do it as steps, and then combine it. The curl command gets https://tuxjam.otherside.network/feed/podcast/ . To do this ourselves we will call curl https://tuxjam.otherside.network/feed/podcast/ --output tuxjam.xml , as the default file name is index.html. This gives us a xml file, and we can confirm it's valid xml with the xmllint command. $ xmllint --format tuxjam.xml >/dev/null $ echo $? 0 Here the output of the command is ignored by redirecting it to /dev/null Then we check the error code the last command had. As it's 0 it completed sucessfully. Kevie then passes the output to the grep search command with the option -o and then looks for any string starting with https followed by anything then followed by two forward slashes, then -o, --only-matching Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line We can do the same with. I was not aware that grep defaulted to regex, as I tend to add the --perl-regexp to explicitly add it. grep --only-matching 'https*://[^"]*ogg' tuxjam.xml http matches the characters http literally (case sensitive) s* matches the character s literally (case sensitive) Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy] : matches the character : literally / matches the character / literally / matches the character / literally [^"]* match a single character not present in the list below Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy] " a single character in the list " literally (case sensitive) ogg matches the characters ogg literally (case sensitive) When we run this ourselves we get the following $ grep --only-matching 'https*://[^"]*ogg' tuxjam.xml https://archive.org/download/tuxjam-121/tuxjam_121.ogg https://archive.org/download/tuxjam-120/TuxJam_120.ogg https://archive.org/download/tux-jam-119/TuxJam_119.ogg https://archive.org/download/tuxjam_118/tuxjam_118.ogg https://archive.org/download/tux-jam-117-uncut/TuxJam_117.ogg https://tuxjam.otherside.network/tuxjam-115-ogg https://archive.org/download/tuxjam_116/tuxjam_116.ogg https://tuxjam.otherside.network/tuxjam-115-ogg https://tuxjam.otherside.network/tuxjam-115-ogg https://t

4549 episodes

#Tech #Community Radio #Tech Interviews #Linux #Open #Software Freedom #Tech News #HPR Volunteer #Hacker Public Radio #Podcasting Education #Hobbies