I'm using Netbeans building a web application using Java, JSP that handle a database with Hebrew fields.
The DDL is as follows:
String cityTable = "CREATE TABLE IF NOT EXISTS hebrew_test.table ("
+"id int(11) NOT NULL AUTO_INCREMENT,"
+"en varchar(30) NOT NULL,"
+"he varchar(30) COLLATE utf8_bin NOT NULL,"
+"PRIMARY KEY (id)"
+") ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=1;";
String insert = "INSERT INTO hebrew_test.table (en, he) VALUES ('A','a')";
String insert2 = "INSERT INTO hebrew_test.table (en, he) VALUES ('B','ב')";
String insert3 = "INSERT INTO hebrew_test.table (en, he) VALUES ('C','אבג')";
executeSQLCommand(cityTable);
executeSQLCommand(insert);
executeSQLCommand(insert2);
executeSQLCommand(insert3);
The output tabel I get:
1 A a
2 B ?
3 C ???
Instead of:
1 A a
2 B ב
3 C אבג
I tried Hebrew appears as question marks in Netbeans, but that isn't the same problem. I get the question marks in the table.
Also I defined the table to be in UTF8_bin
as you can see in the above code.
You need to tell the JDBC driver to use UTF-8 encoding while decoding the characters representing the SQL query to bytes. You can do that by adding useUnicode=yes
and characterEncoding=UTF-8
query parameters to the JDBC connection URL.
jdbc:mysql://localhost:3306/db_name?useUnicode=yes&characterEncoding=UTF-8
It will otherwise use the operating system platform default charset. The MySQL JDBC driver is itself well aware about the encoding used in both the client side (where the JDBC code runs) and the server side (where the DB table is). Any character which is not covered by the charset used by the DB table will be replaced by a question mark.
You're including your values directly into the SQL. That's always a bad idea. Use a PreparedStatement
, parameterized SQL, and set the values as parameters. It may not fix the problem - but it's definitely the first thing to attempt, as you should be using parameterized SQL anyway. (Parameterized SQL avoids SQL injection attacks, separates code from data, and avoids unnecessary conversions.)
Next, you should work out exactly where the problem is really occurring:
When checking the values, you should iterate over each character in the string and print out the value as a UTF-16 code unit (either use toCharArray()
or use charAt()
in a loop). Just printing the value to the console leaves too much chance of other problems.
EDIT: For a little context of why I wrote this as an answer:
All of this is important as a way of moving the OP forward, not just in a comment-like request for more information.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With