Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RegExp to strip HTML comments

Tags:

html

regex

php

Looking for a regexp sequence of matches and replaces (preferably PHP but doesn't matter) to change this (the start and end is just random text that needs to be preserved).

IN:

fkdshfks khh fdsfsk  <!--g1--> <div class='codetop'>CODE: AutoIt</div> <div class='geshimain'>     <!--eg1-->     <div class="autoit" style="font-family:monospace;">         <span class="kw3">msgbox</span>     </div>     <!--gc2-->     <!--bXNnYm94-->     <!--egc2-->     <!--g2--> </div> <!--eg2--> fdsfdskh 

to this OUT:

fkdshfks khh fdsfsk  <div class='codetop'>CODE: AutoIt</div> <div class='geshimain'>     <div class="autoit" style="font-family:monospace;">         <span class="kw3">msgbox</span>     </div> </div> fdsfdskh 

Thanks.

like image 230
James Brooks Avatar asked Jul 05 '09 20:07

James Brooks


1 Answers

Are you just trying to remove the comments? How about

s/<!--[^>]*-->//g 

or the slightly better (suggested by the questioner himself):

<!--(.*?)--> 

But remember, HTML is not regular, so using regular expressions to parse it will lead you into a world of hurt when somebody throws bizarre edge cases at it.

like image 163
Paul Tomblin Avatar answered Sep 18 '22 22:09

Paul Tomblin