Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract an HTML anchor's href from a file in Bash?

Tags:

html

bash

At the page https://developer.android.com/studio/index.html, there is a link to the Android SDK tools for Linux, which I'd like to download by a script. Unfortunately, there is no "easy" link to use to download the latest version, so I'd like to extract the link from the HTML itself.

The link is identified by the id linux-tools and is contained on multiple lines:

  <a onclick="return onDownload(this)" id="linux-tools" data-modal-toggle="studio_tos"
    href="https://dl.google.com/android/repository/sdk-tools-linux-3859397.zip">sdk-tools-linux-38593

I'd like to extract that href into a variable in a Bash script. The closest I've gotten so far is the following:

grep -o -z '<a.[^<]*id="linux-tools"[^<]*</a>' index.html

which outputs the above two lines.

How do I get at the actual link using typically-available shell commands?


1 Answers

You can use sed to first select the range you want to work, for example:

sed -n '/id="linux-tools"/,+1 p' index.html

That will give you the address from line containing id="linux-tools" plus one line.

Now you can use sed substitute to extract the href just from that range:

sed -n '/id="linux-tools"/,+1 s/.*href="\([^"]*\).*$/\1/p' index.html
like image 158
MauricioRobayo Avatar answered Sep 04 '25 15:09

MauricioRobayo



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!