Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove HTML comments using Regex in Python

Tags:

python

regex

I want to remove HTML comments from an html text

<h1>heading</h1> <!-- comment-with-hyphen --> some text <-- con --> more text <hello></hello> more text

should result in:

<h1>heading</h1> some text <-- con --> more text <hello></hello> more text
like image 948
Rushabh Mehta Avatar asked Jan 29 '15 06:01

Rushabh Mehta


2 Answers

Finally came up with this option:

re.sub("(<!--.*?-->)", "", t)

Adding the ? makes the search non-greedy and does not combine multiple comment tags.

like image 36
Rushabh Mehta Avatar answered Oct 03 '22 12:10

Rushabh Mehta


You shouldn't ignore Carriage return.

re.sub("(<!--.*?-->)", "", s, flags=re.DOTALL)
like image 126
John Hua Avatar answered Oct 03 '22 10:10

John Hua