How to extract URL from html source with sed/awk or cut?

Question

I am writing a script that will download an html page source as a file and then read the file and extract a specific URL that is located after a specific code. (it only has 1 occurrence)

Here is a sample that I need matched:

<img id="sample-image" class="photo" src="http://xxxx.com/some/ic/pic_1asda963_16x9.jpg"

The code preceding the URL will always be the same so I need to extract the part between:

<img id="sample-image" class="photo" src="

and the " after the URL.

I tried something with sed like this:

sed -n '\<img\ id=\"sample-image\"\ class=\"photo\"\ src=\",\"/p' test.txt

But it does not work. I would appreciate your suggestions, thanks a lot !

Gilles Quenot · Accepted Answer

You can use grep like this :

grep -oP '<img\s+id="sample-image"\s+class="photo"\s+src="\K[^"]+' test.txt

or with sed :

sed -r 's/<img\s+id="sample-image"\s+class="photo"\s+src="([^"]+)"/\1/' test.txt

or with awk :

awk -F'src="' -F'"' '/<img\s+id="sample-image"/{print $6}' test.txt

How to extract URL from html source with sed/awk or cut?

Tags:

sed

awk

Jason Carter

1 Answers

Gilles Quenot

Recent Activity

Donate For Us

How to extract URL from html source with sed/awk or cut?

Tags:

sed

awk

Jason Carter

1 Answers

Gilles Quenot

Related questions

Recent Activity

Donate For Us