Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java PreparedStatement UTF-8 character problem

I have a prepared statement:

PreparedStatement st;

and at my code i try to use st.setString method.

st.setString(1, userName);

Value of userName is şakça. setString methods changes 'şakça' to '?akça'. It doesnt recognize UTF-8 characters. How can i solve this problem?

Thanks.

like image 840
kamaci Avatar asked Sep 30 '10 08:09

kamaci


3 Answers

The number of ways this can get screwed up is actually quite impressive. If you're using MySQL, try adding a characterEncoding=UTF-8 parameter to the end of your JDBC connection URL:

jdbc:mysql://server/database?characterEncoding=UTF-8

You should also check that the table / column character set is UTF-8.

like image 199
Joshua Martell Avatar answered Oct 24 '22 09:10

Joshua Martell


Whenever a database changes a character to ?, then it simply means that the codepoint of the character in question is completely out of the range for the character encoding as the table is configured to use.

As to the cause of the problem: the ç lies within ISO-8859-1 range and has exactly the same codepoint as in UTF-8 (U+00E7). However, the UTF-8 codepoint of ş lies completely outside the range of ISO-8859-1 (U+015F while ISO-8859-1 only goes up to U+00FF). The DB won't persist the character and replace it by ?.

So, I suspect that your DB table is still configured to use ISO-8859-1 (or in one of other compatible ISO-8859 encodings where ç has the same codepoint as in UTF-8).

The Java/JDBC API is doing its job perfectly fine with regard to character encoding (Java uses Unicode all the way) and the JDBC DB connection encoding is also configured correctly. If Java/JDBC would have incorrectly used ISO-8859-1, then the persisted result would have been Åakça (the ş exist of bytes 0xC5 and 0x9F which represents Å and a in ISO-8859-1 and the ç exist of bytes 0xC3 and 0xA7 which represents à and § in ISO-8859-1).

like image 34
BalusC Avatar answered Oct 24 '22 08:10

BalusC


setString methods changes 'şakça' to '?akça'

How do you know that setString changes this? Or do you see the content in the database and decide this?

It could be that the database is not configured for UTF-8, or simply that the tool you use to see the contects of the database (SQL*PLUS for Oracle...) is not capable of diaplaying UTF-8.

like image 31
Nivas Avatar answered Oct 24 '22 09:10

Nivas