Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract URL from html source with sed/awk or cut?

Tags:

sed

awk

I am writing a script that will download an html page source as a file and then read the file and extract a specific URL that is located after a specific code. (it only has 1 occurrence)

Here is a sample that I need matched:

<img id="sample-image" class="photo" src="http://xxxx.com/some/ic/pic_1asda963_16x9.jpg"

The code preceding the URL will always be the same so I need to extract the part between:

<img id="sample-image" class="photo" src="

and the " after the URL.

I tried something with sed like this:

sed -n '\<img\ id=\"sample-image\"\ class=\"photo\"\ src=\",\"/p' test.txt

But it does not work. I would appreciate your suggestions, thanks a lot !

like image 680
Jason Carter Avatar asked Dec 07 '25 23:12

Jason Carter


1 Answers

You can use grep like this :

grep -oP '<img\s+id="sample-image"\s+class="photo"\s+src="\K[^"]+' test.txt

or with sed :

sed -r 's/<img\s+id="sample-image"\s+class="photo"\s+src="([^"]+)"/\1/' test.txt

or with awk :

awk -F'src="' -F'"' '/<img\s+id="sample-image"/{print $6}' test.txt
like image 190
Gilles Quenot Avatar answered Dec 10 '25 16:12

Gilles Quenot



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!