Can php detect 4-byte encoded utf8 chars?

Tags:

I am using a utf8 charset mysql tables in a mysql 5.1 server, which does not support utf8mb4 encoding in tables. When inserting 4-byte encoded utf8 characters like "𡃁","𨋢","𠵱","𥄫","𠽌","唧","𠱁". The table will popup error or skip the following texts.

How can I programmatically detect 4-byte encoded utf8 characters in PHP and replace them?

261

asked May 11 '13 11:05

Abby Chau Yu Hoi

2 Answers

The following regular expression will replace 4-byte UTF-8 characters:

function replace4byte($string, $replacement = '') {
    return preg_replace('%(?:
          \xF0[\x90-\xBF][\x80-\xBF]{2}      # planes 1-3
        | [\xF1-\xF3][\x80-\xBF]{3}          # planes 4-15
        | \xF4[\x80-\x8F][\x80-\xBF]{2}      # plane 16
    )%xs', $replacement, $string);    
}

var_dump(replace4byte('d'), replace4byte('d𡃁d'));

This doesn't rely on the /u modifier, so you shouldn't need to worry about UTF-8 for PCRE being compiled in. However, if you have that support, deceze's preg_replace_callback is neater.

(Regex adapted from Ensuring valid utf-8 in PHP)

150

answered Oct 10 '22 22:10

cmbuckley

This should work:

if (max(array_map('ord', str_split($string))) >= 240)

The rational being that code points up to and including U+FFFF are encoded as three bytes of the form 1110xxxx 10xxxxxx 10xxxxxx. Higher code points are of the form 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx, i.e. the highest byte has a value of 240 or higher. If there are any such bytes in the string, it's an indicator for a 4-byte sequence.

If you want to remove long characters, this will do:

preg_replace_callback('/./u', function (array $match) {
    return strlen($match[0]) >= 4 ? null : $match[0];
}, $string)

Though there may be a more elegant regex way to express high codepoints directly.

answered Oct 10 '22 22:10

deceze

Related questions
                            
                                Undefined method on mock object implementing a given interface in PHPUnit?
                            
                                How to convert string duration to ISO 8601 duration format? (e.g. "30 minutes" to "PT30M")
                            
                                Can I use dynamic content in a Bootstrap popover?
                            
                                How can I determine CodeIgniter speed?
                            
                                How to block uploads of nude images? [closed]
                            
                                Fatal error: Maximum execution time of 0 seconds exceeded
                            
                                Is it possible to attack a user password with known salt
                            
                                Disable HTML stack traces by Xdebug
                            
                                PHP/Apache/AJAX - POST limit?
                            
                                Combining Angularjs and CodeIgniter
                            
                                Why BroadCastEvent are queued in Laravel? How to stop that?
                            
                                Can you store a PHP Array in Memcache?
                            
                                Extending PHP static classes
                            
                                How to execute a large PHP Script?
                            
                                PHP script: malicious JavaScript code at the end
                            
                                How can I use Basic HTTP Authentication in PHP?
                            
                                Php type hinting not getting along with interfaces and abstract classes?
                            
                                Is it possible to have encryption with multiple private keys (PHP)?
                            
                                RAW SQL Query with Zend Framework
                            
                                What is the difference between Entity and Model in Symfony2 / Doctrine

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With