How to extract an HTML anchor's href from a file in Bash?

Question

At the page https://developer.android.com/studio/index.html, there is a link to the Android SDK tools for Linux, which I'd like to download by a script. Unfortunately, there is no "easy" link to use to download the latest version, so I'd like to extract the link from the HTML itself.

The link is identified by the id linux-tools and is contained on multiple lines:

  <a onclick="return onDownload(this)" id="linux-tools" data-modal-toggle="studio_tos"
    href="https://dl.google.com/android/repository/sdk-tools-linux-3859397.zip">sdk-tools-linux-38593

I'd like to extract that href into a variable in a Bash script. The closest I've gotten so far is the following:

grep -o -z '<a.[^<]*id="linux-tools"[^<]*</a>' index.html

which outputs the above two lines.

How do I get at the actual link using typically-available shell commands?

MauricioRobayo · Accepted Answer

You can use sed to first select the range you want to work, for example:

sed -n '/id="linux-tools"/,+1 p' index.html

That will give you the address from line containing id="linux-tools" plus one line.

Now you can use sed substitute to extract the href just from that range:

sed -n '/id="linux-tools"/,+1 s/.*href="$[^"]*$.*$/\1/p' index.html

How to extract an HTML anchor's href from a file in Bash?

Tags:

html

bash

1 Answers

MauricioRobayo

Recent Activity

Donate For Us

How to extract an HTML anchor's href from a file in Bash?

Tags:

html

bash

1 Answers

MauricioRobayo

Related questions

Recent Activity

Donate For Us