Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract HTML tag data with sed

Tags:

html

sed

tags

I wish to extract data between known HTML tags. For example:

Hello, <i>I<i> am <i>very</i> glad to meet you.

Should become:

'I

very'

So I have found something that works to nearly do this. Unfortunately, it only extracts the last entry.

sed -n -e 's/.*<i>\(.*\)<\/i>.*/\1/p'

Now I can append any end tag </i> with a newline character and this works fine. But is there a way to do it with just one sed command?

like image 781
Nic Avatar asked Oct 17 '25 08:10

Nic


2 Answers

Give this a try:

sed -n 's|[^<]*<i>\([^<]*\)</i>[^<]*|\1\n|gp'

And your example is missing a "/":

Hello, <i>I</i> am <i>very</i> glad to meet you.
like image 85
Dennis Williamson Avatar answered Oct 19 '25 22:10

Dennis Williamson


Try this:

$ sed 's/<[^>]*>//g' file.html
like image 29
lattimore Avatar answered Oct 20 '25 00:10

lattimore



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!