It seems like MySQL does not support characters with more than 3 bytes in its default UTF-8 charset. So, in PHP, how can I get rid of all 4(-and-more)-byte characters in a string and replace them with something like by some other character?

NOTE: you should not just strip, but replace with replacement character U+FFFD to avoid unicode attacks, mostly XSS: http://unicode.org/reports/tr36/#Deletion_of_Noncharacters <pre class="prettyprint"><code>preg_replace('/[\x{10000}-\x{10FFFF}]/u', "\xEF\xBF\xBD", $value); </code></pre>

Since 4-byte UTF-8 sequences always start with the bytes <code>0xF0-0xF7</code>, the following should work: <pre class="prettyprint"><code>$str = preg_replace('/[\xF0-\xF7].../s', '', $str); </code></pre> Alternatively, you could use <code>preg_replace</code> in UTF-8 mode but this will probably be slower: <pre class="prettyprint"><code>$str = preg_replace('/[\x{10000}-\x{10FFFF}]/u', '', $str); </code></pre> This works because 4-byte UTF-8 sequences are used for code points in the supplementary Unicode planes starting from <code>0x10000</code>.

How to replace/remove 4(+)-byte characters from a UTF-8 string in PHP?

2 Answers

NOTE: you should not just strip, but replace with replacement character U+FFFD to avoid unicode attacks, mostly XSS:

http://unicode.org/reports/tr36/#Deletion_of_Noncharacters

preg_replace('/[\x{10000}-\x{10FFFF}]/u', "\xEF\xBF\xBD", $value);

answered Oct 14 '22 16:10

glen

Since 4-byte UTF-8 sequences always start with the bytes 0xF0-0xF7, the following should work:

$str = preg_replace('/[\xF0-\xF7].../s', '', $str);

Alternatively, you could use preg_replace in UTF-8 mode but this will probably be slower:

$str = preg_replace('/[\x{10000}-\x{10FFFF}]/u', '', $str);

This works because 4-byte UTF-8 sequences are used for code points in the supplementary Unicode planes starting from 0x10000.

answered Oct 14 '22 16:10

nwellnhof

Related questions
                            
                                WooCommerce: Auto complete paid orders
                            
                                Combine directory and file name in PHP ( equivalent of Path.Combine in .Net)
                            
                                Many hash iterations: append salt every time?
                            
                                Whats the point of running Laravel with the command 'php artisan serve'?
                            
                                Cast the current object ($this) to a descendent class
                            
                                Organizing PHPUnit Tests in Namespaces [closed]
                            
                                Getting error in console : Failed to load resource: net::ERR_CONNECTION_RESET
                            
                                Use php namespace inside function
                            
                                How to implement authorization using a Telegram API?
                            
                                String concatenation while incrementing
                            
                                Encrypting Passwords
                            
                                How does Laravel know Request::wantsJson is a request for JSON?
                            
                                Why PHP variables start with a $ sign symbol?
                            
                                How to debug php artisan serve in PHPStorm?
                            
                                Correct way to manage sessions in PHP?
                            
                                Resource interpreted as image but transferred with MIME type text/html - Magento
                            
                                UID of script "/home/...../public_html/index.php" is smaller than min_uid
                            
                                Symfony 2 - How to delete a bundle?
                            
                                Do I need a !DOCTYPE declaration in a php file with html?
                            
                                difference between $query>num_rows() and $this->db->count_all_results() in CodeIgniter & which one is recommended

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to replace/remove 4(+)-byte characters from a UTF-8 string in PHP?

Tags:

php

mysql

utf-8

Franz

People also ask

2 Answers

glen

nwellnhof

Recent Activity

Donate For Us