Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bad UTF-8 encoding when writing to database (reading is OK)

I have big problem in my web application using JSF and EclipseLink JPA to MySQL database.

When I read data from database JSF reads and writes my charachters in UTF-8 OK. but in database characters are bad.

f.e.: input characters: "żźćółzxcv", written in database: "?????zxcv". But if I manually write data to database, for example: "żźćółzxcv", then reading in JSF is perfect.

I tried everything from here: Unicode input retrieved via PrimeFaces input components become corrupted

And I discovered that encoding in JSF is fine, but the problem is in java, becouse if I set manually

current.setUwagiZ("żźćóżźćłąśóżźćł TE");
getFacade().edit(current);

in database record is wrong: ???ó??????ó???? TE

I have set characterEncoding and useUnicode in JDBC Resource. Also when execute commands by some tools in NetBeans encoding is OK and data in MySQL are in UTF-8, so connections seems fine.

So the problem is java, but I completely don't know how to solve this :(

like image 864
Fiber Avatar asked Feb 04 '13 13:02

Fiber


People also ask

Why should we specify UTF-8 file encoding?

Why use UTF-8? An HTML page can only be in one encoding. You cannot encode different parts of a document in different encodings. A Unicode-based encoding such as UTF-8 can support many languages and can accommodate pages and forms in any mixture of those languages.

What is valid UTF-8?

Each character is represented by one to four bytes. UTF-8 is backward-compatible with ASCII and can represent any standard Unicode character. The first 128 UTF-8 characters precisely match the first 128 ASCII characters (numbered 0-127), meaning that existing ASCII text is already valid UTF-8.

What is UTF-8 and what problem does it solve?

UTF-8 is a way of encoding Unicode so that an ASCII text file encodes to itself. No wasted space, beyond the initial bit of every byte ASCII doesn't use. And if your file is mostly ASCII text with a few non-ASCII characters sprinkled in, the non-ASCII characters just make your file a little longer.


1 Answers

Question marks can occur when the messenger itself is aware about the character encoding used in the both sides of the transport. That's the difference with Mojibake whereby it's not the messenger's fault, but the producer's and/or consumer's fault.

In an average web application with a database backend, there are only 2 places where this can happen: communication with the DB and communication with the HTTP client. You've already excluded the HTTP part, so left behind the DB part.

The messenger in the DB part is the JDBC driver. You need to tell the JDBC driver to use UTF-8. MySQL JDBC driver is known to use by default the client platform default encoding, which is in your particular case apparently not UTF-8.

Add the following 2 properties to the JDBC connection:

  • useUnicode=true
  • characterEncoding=UTF-8

It's unclear how you've configured the JDBC connection, but if it's "plain vanilla" JDBC, then specify them as query string in JDBC URL:

jdbc:mysql://localhost:3306/db_name?useUnicode=true&characterEncoding=UTF-8

Or if it's a container-specific datasource config, then specify them as separate connection properties, exactly the same way as you specify the username and password.

See also:

  • Unicode - How to get the characters right?
like image 123
BalusC Avatar answered Oct 11 '22 04:10

BalusC