Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting question marks when inserting Hebrew characters into a MySQL table

I'm using Netbeans building a web application using Java, JSP that handle a database with Hebrew fields.

The DDL is as follows:

String cityTable = "CREATE TABLE IF NOT EXISTS hebrew_test.table ("
                            +"id int(11) NOT NULL AUTO_INCREMENT,"
                            +"en varchar(30) NOT NULL,"
                            +"he varchar(30) COLLATE utf8_bin NOT NULL,"
                            +"PRIMARY KEY (id)"
                            +") ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=1;";
String insert = "INSERT INTO hebrew_test.table (en, he) VALUES ('A','a')";
String insert2 = "INSERT INTO hebrew_test.table (en, he) VALUES ('B','ב')";
String insert3 = "INSERT INTO hebrew_test.table (en, he) VALUES ('C','אבג')";


executeSQLCommand(cityTable);
executeSQLCommand(insert);
executeSQLCommand(insert2);
executeSQLCommand(insert3);

The output tabel I get:

1   A   a
2   B   ?
3   C   ???

Instead of:

1   A   a
2   B   ב
3   C   אבג

I tried Hebrew appears as question marks in Netbeans, but that isn't the same problem. I get the question marks in the table.

Also I defined the table to be in UTF8_bin as you can see in the above code.

like image 428
Matan Touti Avatar asked Jan 15 '23 15:01

Matan Touti


2 Answers

You need to tell the JDBC driver to use UTF-8 encoding while decoding the characters representing the SQL query to bytes. You can do that by adding useUnicode=yes and characterEncoding=UTF-8 query parameters to the JDBC connection URL.

jdbc:mysql://localhost:3306/db_name?useUnicode=yes&characterEncoding=UTF-8

It will otherwise use the operating system platform default charset. The MySQL JDBC driver is itself well aware about the encoding used in both the client side (where the JDBC code runs) and the server side (where the DB table is). Any character which is not covered by the charset used by the DB table will be replaced by a question mark.

See also:

  • Spring Encoding with CharacterEncodingFilter in web.xml
like image 129
BalusC Avatar answered Jan 17 '23 17:01

BalusC


You're including your values directly into the SQL. That's always a bad idea. Use a PreparedStatement, parameterized SQL, and set the values as parameters. It may not fix the problem - but it's definitely the first thing to attempt, as you should be using parameterized SQL anyway. (Parameterized SQL avoids SQL injection attacks, separates code from data, and avoids unnecessary conversions.)

Next, you should work out exactly where the problem is really occurring:

  • Make sure that the value you're trying to insert is correct.
  • Check that the value you retrieve is correct.
  • Check what's in your web response using Wireshark - check the declared encoding and what's in the actual data

When checking the values, you should iterate over each character in the string and print out the value as a UTF-16 code unit (either use toCharArray() or use charAt() in a loop). Just printing the value to the console leaves too much chance of other problems.

EDIT: For a little context of why I wrote this as an answer:

  • In my experience, including string values as parameters rather than directly into SQL can sometimes avoid such issues (and is of course better for security reasons etc).
  • In my experience, diagnosing whether the problem is at the database side or the web side is also important. This diagnosis is best done via logging the exact UTF-16 code units being used, not just strings (as otherwise further encoding issues during logging or console output can occur).
  • In my experience, problems like this can easily occur at either insert or read code paths.

All of this is important as a way of moving the OP forward, not just in a comment-like request for more information.

like image 38
Jon Skeet Avatar answered Jan 17 '23 17:01

Jon Skeet