I'm aware this error means a mysql column doesn't accept the value, but this is strange, since the value fits in a Java UTF-8 encoded string, and the mysql column is utf8_general_ci. Also, all utf8 characters have worked properly so far, apart from these.
The use-case is: I am importing tweets. The tweet in question is: https://twitter.com/bakervin/status/210054214951518212 - you can see the two "strange" characters (and two strange whitespaces between them). The question is - how to handle this:
These appear to be unicode surrogate characters. Since they are not actual characters, and it seems MySQL doesn't support them, it is safe to trim them:
StringBuilder sb = new StringBuilder();
for (int i = 0; i < text.length(); i++) {
char ch = text.charAt(i);
if (!Character.isHighSurrogate(ch) && !Character.isLowSurrogate(ch)) {
sb.append(ch);
}
}
return sb.toString();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With