Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

question regarding php function preg_replace

I want to dynamically remove specific tags and their content from an html file and thought of using preg_replace but can't get the syntax right. Basically it should, for example, do something like : Replace everything between (and including) "" by nothing.

Could anybody help me out on this please ?

like image 522
Argoron Avatar asked Dec 17 '22 05:12

Argoron


2 Answers

Easy dude.

To have a Ungreedy regexpr, use the U modifier And to make it multiline, use the s modifier. Knowing that, to remove all paragraphes use this pattern :

#<p[^>]*>(.*)?</p>#sU

Explain :

  • I use # delimiter to not have to protect my \ characters (to have a more readable pattern)
  • <p[^>]*> : part detecting an opening paragraph (with a hypothetic style, such as )
  • (.*)? : Everything (in "Ungreedy mode")
  • </p> : Obviously, the closing paragraph

Hope that help !

like image 146
Grokwik Avatar answered Dec 28 '22 07:12

Grokwik


If you are trying to sanitize your data, it is often recommended that you use a whitelist as opposed to blacklisting certain terms and tags. This is easier to sanitize and prevent XSS attacks. There's a well known library called HTML Purifier that, although large and somewhat slow, has amazing results regarding purifying your data.

like image 38
Corey Ballou Avatar answered Dec 28 '22 06:12

Corey Ballou