Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UTF 8 String remove all invisible characters except newline

I'm using the following regex to remove all invisible characters from an UTF-8 string:

$string = preg_replace('/\p{C}+/u', '', $string);

This works fine, but how do I alter it so that it removes all invisible characters EXCEPT newlines? I tried some stuff using [^\n] etc. but it doesn't work.

Thanks for helping out!

Edit: newline character is '\n'

like image 785
Stefan Avatar asked Sep 22 '12 11:09

Stefan


1 Answers

Use a "double negation":

$string = preg_replace('/[^\P{C}\n]+/u', '', $string);

Explanation:

  • \P{C} is the same as [^\p{C}].
  • Therefore [^\P{C}] is the same as \p{C}.
  • Since we now have a negated character class, we can substract other characters like \n from it.
like image 170
Tim Pietzcker Avatar answered Sep 29 '22 23:09

Tim Pietzcker