I want to replace these chars [^a-zа-з0-9_] with null, but I can't do it when its multibyte string.
I tried with mb_*, iconv, PCRE, mb_eregi_replace and u modifier (for PCRE), but none of them worked well.
The mb_eregi_replace works, but it only outputs the correct utf8 string, but it doesn't replace the characters, when preg_replace works with the same regex..
Here is my code that works with unicode, but it doesn't replace text.
function _data($data)
{
mb_regex_encoding('UTF-8');
return mb_eregi_replace('/[^a-zа-з0-9_]+/', '', $data);
}
var_dump(namespace\_data('Текст Removethis- and this _#$)( and also this $*@&$'));
and the result is with the special chars (#_$..) when it should replace them, if I change the function to preg_replace (and no unicode) it should replace them.
As long as your input string is UTF-8 encoded (test if not or re-encode it to UTF-8), you can safely use preg_replace
if you use the correct regular expression with the u (PCRE_UTF8
) modifier (the is the lower-case U at the end):
function _data($data)
{
return preg_replace('/[^\w_]+/u', '', $data);
}
var_dump(namespace\_data('Текст Removethis- and this _#$)( and also this $*@&$'));
Demo
\w
= any word characteru
(at then end) = enable UTF-8 for the regex.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With