In php, I need to replace all non-UTF8 characters in a string. However, not by some equivalent (like the iconv function with //TRANSLIT) but by some chosen character (like "_" or "*" for example).
Typically I want the user to be able to see the position were the invalid characters were found.
I didn't find any functions that do this, so I was going to use:
iconv with //IGNOREDo you see a better way to do that, is there some functions in php that can be combined to have this behavior ?
Thanks for you help.
Here are 2 functions to help you achieve something close to what you want :
//reject overly long 2 byte sequences, as well as characters above U+10000 and replace with ?
$some_string = preg_replace('/[\x00-\x08\x10\x0B\x0C\x0E-\x19\x7F]'.
 '|[\x00-\x7F][\x80-\xBF]+'.
 '|([\xC0\xC1]|[\xF0-\xFF])[\x80-\xBF]*'.
 '|[\xC2-\xDF]((?![\x80-\xBF])|[\x80-\xBF]{2,})'.
 '|[\xE0-\xEF](([\x80-\xBF](?![\x80-\xBF]))|(?![\x80-\xBF]{2})|[\x80-\xBF]{3,})/S',
 '?', $some_string );
//reject overly long 3 byte sequences and UTF-16 surrogates and replace with ?
$some_string = preg_replace('/\xE0[\x80-\x9F][\x80-\xBF]'.
 '|\xED[\xA0-\xBF][\x80-\xBF]/S','?', $some_string );
note that you can change the replacement (which currently is '?' with anything else by changing the string located at preg_replace('blablabla', **'?'**, $some_string)
the original article : http://magp.ie/2011/01/06/remove-non-utf8-characters-from-string-with-php/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With