Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you remove all the lines between two html tags using sed (or similar)?

Tags:

regex

bash

sed

I have a file that looks like this:

<HTML>
<HEAD>
< ... stuff ... ></HEAD>
< ... stuff ... >
</HTML>

I'm trying to remove everything between, and including, the HEAD tags, but can't seem to get it to work.

I thought

sed -i -e 's/<HEAD>.*<\/HEAD>//g' file.HTML

should work, but it doesn't remove anything.

sed -i -e '/<HEAD>/,/<\/HEAD>/d' file.HTML

doesn't do anything either. No errors, just nothing.

Is there something wrong with my input file, or is there a different way to go about it?

like image 542
user1948374 Avatar asked Dec 15 '22 15:12

user1948374


1 Answers

Delete all lines between tags leaving tags:

sed '/<tag>/,/<\/tag>/{//!d}' input.txt

Delete all lines between tags including tags:

sed '/<tag>/,/<\/tag>/d' input.txt

To change in place use sed -i .... To change in place while backing up original sed -i.bak ... which will save the original as input.txt.bak.

like image 155
David C. Rankin Avatar answered May 10 '23 23:05

David C. Rankin