Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UTF-8 won't persist on Hibernate + MySQL

I'm trying to save some values in MySQL database by using Hibernate, but most Lithuanian characters won't get saved, including ąĄ čČ ęĘ ėĖ įĮ ųŲ ūŪ(they are saved as ?), however, šŠ žŽ do get saved.

If I do inserts manually, then those values are properly saved, so the problem is most likely in Hibernate configuration.

What I have tried so far:

hibernate.charset=UTF-8
hibernate.character_encoding=UTF-8
hibernate.use_unicode=true

---------

properties.put(PROPERTY_NAME_HIBERNATE_USE_UNICODE,
            env.getRequiredProperty(PROPERTY_NAME_HIBERNATE_USE_UNICODE));
    properties.put(PROPERTY_NAME_HIBERNATE_CHARSET,
            env.getRequiredProperty(PROPERTY_NAME_HIBERNATE_CHARSET));
    properties
            .put(PROPERTY_NAME_HIBERNATE_CHARACTER_ENCODING,
                    env.getRequiredProperty(PROPERTY_NAME_HIBERNATE_CHARACTER_ENCODING));

---------

private void registerCharachterEncodingFilter(ServletContext aContext) {
    CharacterEncodingFilter cef = new CharacterEncodingFilter();
    cef.setForceEncoding(true);
    cef.setEncoding("UTF-8");
    aContext.addFilter("charachterEncodingFilter", cef)
            .addMappingForUrlPatterns(null, true, "/*");
}

As described here

I tried adding ?useUnicode=true&characterEncoding=utf-8 to db connection url.

As described here

I ensured that my db is set to UTF-8 charset. phpmyadmin > information_schema > schemata

def db_name utf8 utf8_lithuanian_ci NULL

This is how I save into db:

//Controller
buildingService.addBuildings(schema.getBuildings());
        List<Building> buildings = buildingService.getBuildings();
        System.out.println("-----------");
        for (Building b : schema.getBuildings()) {
            System.out.println(b.toString());
        }
        System.out.println("-----------");
        for (Building b : buildings) {
            System.out.println(b.toString());
        }
        System.out.println("-----------");

//Service:
@Override
public void addBuildings(List<Building> buildings) {
    for (Building b : buildings) {
        getCurrentSession().saveOrUpdate(b);
    }
}

First set of println contains all Lithuanian characters, while second replaces most with ?

EDIT: Added details

insert into buildings values (11,'ąĄčČęĘ', 'asda');    
select short, hex(short) from buildings;
//Šalt. was inserted via hibernate
//letters are properly displayed:
ąĄčČęĘ       | C485C484C48DC48CC499C498
MIF Šalt.    | 4D494620C5A0616C742E  

select address, hex(address) from buildings;
 Šaltini? <...> | C5A0616C74696E693F20672E2031412C2056696C6E697573
//should contain "ų"
--------
show create table buildings;
buildings | CREATE TABLE `buildings` (
  `id` int(11) NOT NULL,
  `short` varchar(255) COLLATE utf8_lithuanian_ci DEFAULT NULL,
  `address` varchar(255) COLLATE utf8_lithuanian_ci DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_lithuanian_ci 

EDIT: I did not find a proper solution, so I came up with a workaround. I ended up escaping/unescaping characters, storing them like this: \uXXXX.

like image 413
Marius Avatar asked Mar 29 '15 16:03

Marius


1 Answers

Let's verify that they were stored correctly... Please do SELECT col, HEX(col) ... to fetch some cell with Lithuanian characters. A correctly stored ą will show C485. The others should show various hex values of C4xx or C5xx. 3F is ?.

But, more importantly, 4 characters do show. Š should be C5A0 if properly stored as utf8. However, I suspect, you will see 8A, implying that the column in the table is really declared as CHARACTER SET latin1. (The 4 characters show up in the first column of my charset blog ).

Do SHOW CREATE TABLE to see how the column is defined. If it says latin1, then the problem is with the table definition, and you probably ought to start over.

like image 75
Rick James Avatar answered Sep 21 '22 05:09

Rick James