Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

preg_replace with cyrillic chars

I want to replace these chars [^a-zа-з0-9_] with null, but I can't do it when its multibyte string.

I tried with mb_*, iconv, PCRE, mb_eregi_replace and u modifier (for PCRE), but none of them worked well.

The mb_eregi_replace works, but it only outputs the correct utf8 string, but it doesn't replace the characters, when preg_replace works with the same regex..

Here is my code that works with unicode, but it doesn't replace text.

function _data($data)
{
  mb_regex_encoding('UTF-8');
  return mb_eregi_replace('/[^a-zа-з0-9_]+/', '', $data);
}

var_dump(namespace\_data('Текст Removethis- and this _#$)( and also this $*@&$'));

and the result is with the special chars (#_$..) when it should replace them, if I change the function to preg_replace (and no unicode) it should replace them.

like image 763
Alex Emilov Avatar asked Oct 12 '11 16:10

Alex Emilov


1 Answers

As long as your input string is UTF-8 encoded (test if not or re-encode it to UTF-8), you can safely use preg_replace if you use the correct regular expression with the u (PCRE_UTF8) modifier (the is the lower-case U at the end):

function _data($data)
{ 
  return preg_replace('/[^\w_]+/u', '', $data);
}

var_dump(namespace\_data('Текст Removethis- and this _#$)( and also this $*@&$'));

Demo

  • \w = any word character
  • u (at then end) = enable UTF-8 for the regex.
like image 163
hakre Avatar answered Nov 07 '22 17:11

hakre