Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP match control characters but not whitespace?

Tags:

regex

php

Using the POSIX character classes

How to match [:cntrl:] but excluding the [:space:]?

$message =  ereg_replace("[[:cntrl:]]", "", $message);
like image 454
Howard Avatar asked Mar 03 '12 10:03

Howard


1 Answers

ereg_* (POSIX) functions have been deprecated for a long time now. You should not contiue using these methods.

According to POSIX Bracket Expressions [:cntrl:] resolves to the ASCII range [\x00-\x1F\x7F] (or the unicode \p{Cc}) and [:space:] resolves to [ \t\r\n\v\f]. Using asciitable.com to resolve those characters, you are left with an exclusion list of [\x20\x09-\x0D]. "Doing the math" you are left with [\x00-\x08\x0E-\x1F\x7F]. and that leaves you with the following, PHP 5.3 and upward compatible, sanitization:

$message = preg_replace('/[\x00-\x08\x0E-\x1F\x7F]+/', '', $message);

Note that VT (Vertical Tab) and FF (Form Feed, New page) are also preserved. Depending on your situation you might want to remove these, too:

$message = preg_replace('/[\x00-\x08\x0E-\x1F\x7F\x0A\x0C]+/', '', $message);
like image 59
rodneyrehm Avatar answered Oct 04 '22 01:10

rodneyrehm