Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I identify if the string contains a special character which cannot be stored using a utf8-mb4 character set

Refer to this tweet and the following thread were we are trying to store a similar tweet into the database. I am unable to store this tweet in MySQL, I would like to know how to identify, if the string contains a character which cannot be processed by the utf8-mb4 character set, so that I can avoid storing it.

like image 964
priya Avatar asked Jan 09 '12 06:01

priya


People also ask

How do I check if a string contains special characters?

To check if a string contains special characters, call the test() method on a regular expression that matches any special character. The test method will return true if the string contains at least 1 special character and false otherwise.

Can strings store special characters?

Explanation: Given string contains only special characters. Therefore, the output is Yes.

What is character set utf8mb4?

utf8mb4 : A UTF-8 encoding of the Unicode character set using one to four bytes per character. utf8mb3 : A UTF-8 encoding of the Unicode character set using one to three bytes per character. This character set is deprecated in MySQL 8.0, and you should use utfmb4 instead.

Should I use UTF-8 or utf8mb4?

The difference between utf8 and utf8mb4 is that the former can only store 3 byte characters, while the latter can store 4 byte characters. In Unicode terms, utf8 can only store characters in the Basic Multilingual Plane, while utf8mb4 can store any Unicode character.


1 Answers

The character that poses a problem for you is U+1F603 SMILING FACE WITH OPEN MOUTH, which has a value not representable in 16 bits. When converted to UTF-8 the byte values are f0 9f 98 83, which should fit without issues in a utf8mb4 character set MySQL column, so I will agree with the other commenters that it doesn't look to be a MySQL issue. If you can attempt to re-insert this tweet, log all SQL statements as received by MySQL to determine if the characters get corrupted before or after sending them to MySQL.

like image 172
Tassos Bassoukos Avatar answered Oct 14 '22 15:10

Tassos Bassoukos