Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Issue while saving Non-English character

We are working with one application where we need to save data in language Gujarati.

Technologies used in Applcation is listed below.

  • Spring MVC Version 4.1.6.RELEASE
  • Hibernate Version 4.3.5.Final
  • MySQL 6.0.11

My JSP is configured with

<%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>

And

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

Hibernate configuration is

<prop key="hibernate.connection.useUnicode">true</prop>
<prop key="hibernate.connection.characterEncoding">UTF-8</prop>
<prop key="hibernate.connection.charSet">UTF-8</prop>

MySQL URL is

jdbc:mysql://host:port/dbName?useUnicode=true&connectionCollation=utf8_general_ci&characterSetResults=utf8

Pojo having String field to store that data.

MySQL have VARCHAR datatype to store data with charset=utf8 and Collation=utf8_general_ci

When i tried to save any non-english(Gujrati) character it show some garbage character like àª?à«?àª? for "ગુજ".

Is there any other configuration which i missed here.

like image 607
Yogesh Prajapati Avatar asked Aug 04 '15 16:08

Yogesh Prajapati


3 Answers

I was facing the same problem while inserting "tamil" characters into the database.After surfing a lot I got a better and working solution and it solves my problem.Here I am sharing my solution with you.I hope it will help you to clear your doubts regarding that Non English character.

INSERT INTO 
STUDENT(name,address) 
VALUES 
(N'பெயர்', N'முகவரி');

I am using a sample since you have not provided me any structure of your table and field name.

like image 167
Venkatvasan Avatar answered Nov 03 '22 15:11

Venkatvasan


I am assuming you want ગુજ (GA JA with Vowel sign U)?

I think you somehow specified "latin5". (Yes I see you have UTF-8 everywhere, but "latin5" is the only way I can make things work.)

CONVERT(CONVERT(UNHEX('C3A0C2AAC297C3A0C2ABC281C3A0C2AAC29C')
       USING utf8) USING latin5) = 'ગુજ'

Plus you ended up with "double encoding"; I suspect this is what happened:

  • The client had characters encoded as utf8 (good); and
  • SET NAMES latin5 was used, but it lied by claiming that the client had latin5 encoding; and
  • The column in the table declared CHARACTER SET utf8 (good).

If possible, it would be better to start over -- empty the tables, be sure to have SET NAMES utf8 or establish utf8 when connecting from your client to the database. Then repopulate the tables.

If you would rather try to recover the existing data, this might work:

UPDATE ... SET col = CONVERT(BINARY(CONVERT(
                         CONVERT(UNHEX(col) USING utf8)
                         USING latin5)) USING utf8);

But you would need to do that for each messed up column in each table.

A partial test of that code is to do

SELECT CONVERT(BINARY(CONVERT(
                         CONVERT(UNHEX(col) USING utf8)
                         USING latin5)) USING utf8)
     FROM table;

I say "partial test" because looking right may not prove that is right.

After the UPDATE, SELECT HEX(col) get E0AA97E0AB81E0AA9C for ગુજ. Note that most Gujarati hex should be of the form E0AAyy or E0AByy. You might also find 20 for a blank space.

I apologize for not being more certain. I have been tackling Character Set issues for a decade, but this is a new variant.

like image 39
Rick James Avatar answered Nov 03 '22 17:11

Rick James


There might be a couple of things that you could have missed out. I had the same problem with mysql on linux, what I had to do is to edit my.cnf like this:

[client]
default-character-set = utf8

[mysqld]
character-set-server = utf8

For e.g. on Centos this file is location at /etc/my.cnf on Windows (my pc) C:\ProgramData\MySQL\MySQL Server 5.5\my.ini. Please note that ProgramData might be hidden.

Also the other thing if you are using Tomcat is that you have to sepcify UTF-8 for URI encoding. Just edit server.xml and modify your main Connector element:

<Connector port="8080" protocol="HTTP/1.1"
           connectionTimeout="20000"
           URIEncoding="UTF-8"
           redirectPort="8443" />

Also make sure you added character encoding filter in your application:

@WebFilter(filterName = "CharacterEncodingFilter", urlPatterns = {"/*"})
public class CharacterEncodingFilter implements Filter {

    @Override
    public void init(FilterConfig filterConfig)
            throws ServletException {
    }

    @Override
    public void doFilter(ServletRequest servletRequest, ServletResponse servletResponse, FilterChain filterChain)
            throws IOException, ServletException {
        HttpServletRequest request = (HttpServletRequest) servletRequest;

        request.setCharacterEncoding("UTF-8");
        servletResponse.setContentType("text/html; charset=UTF-8");

        filterChain.doFilter(request, servletResponse);
    }

    @Override
    public void destroy() {
    }

}

Hope this helps.

like image 20
Paulius Matulionis Avatar answered Nov 03 '22 15:11

Paulius Matulionis