Refer to this tweet and the following thread were we are trying to store a similar tweet into the database. I am unable to store this tweet in MySQL, I would like to know how to identify, if the string contains a character which cannot be processed by the utf8-mb4 character set, so that I can avoid storing it.
To check if a string contains special characters, call the test() method on a regular expression that matches any special character. The test method will return true if the string contains at least 1 special character and false otherwise.
Explanation: Given string contains only special characters. Therefore, the output is Yes.
utf8mb4 : A UTF-8 encoding of the Unicode character set using one to four bytes per character. utf8mb3 : A UTF-8 encoding of the Unicode character set using one to three bytes per character. This character set is deprecated in MySQL 8.0, and you should use utfmb4 instead.
The difference between utf8 and utf8mb4 is that the former can only store 3 byte characters, while the latter can store 4 byte characters. In Unicode terms, utf8 can only store characters in the Basic Multilingual Plane, while utf8mb4 can store any Unicode character.
The character that poses a problem for you is U+1F603 SMILING FACE WITH OPEN MOUTH
, which has a value not representable in 16 bits. When converted to UTF-8 the byte values are f0 9f 98 83
, which should fit without issues in a utf8mb4
character set MySQL column, so I will agree with the other commenters that it doesn't look to be a MySQL issue. If you can attempt to re-insert this tweet, log all SQL statements as received by MySQL to determine if the characters get corrupted before or after sending them to MySQL.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With