Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to handle SQL state [HY000]; error code [1366]; Incorrect string value?

Tags:

java

mysql

utf-8

I'm aware this error means a mysql column doesn't accept the value, but this is strange, since the value fits in a Java UTF-8 encoded string, and the mysql column is utf8_general_ci. Also, all utf8 characters have worked properly so far, apart from these.

The use-case is: I am importing tweets. The tweet in question is: https://twitter.com/bakervin/status/210054214951518212 - you can see the two "strange" characters (and two strange whitespaces between them). The question is - how to handle this:

  • trim these characters (how - which are they, how does the Java UTF-8 differ from MySQL one)
  • make the column capable of accepting this value (how - is there anything more utf-y than utf8_general_ci)
like image 638
Bozho Avatar asked Jun 15 '12 20:06

Bozho


1 Answers

These appear to be unicode surrogate characters. Since they are not actual characters, and it seems MySQL doesn't support them, it is safe to trim them:

StringBuilder sb = new StringBuilder();
for (int i = 0; i < text.length(); i++) {
    char ch = text.charAt(i);
    if (!Character.isHighSurrogate(ch) && !Character.isLowSurrogate(ch)) {
        sb.append(ch);
    }
}
return sb.toString();
like image 77
Bozho Avatar answered Nov 15 '22 21:11

Bozho