Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

character taking 6 bytes

We are trying to save the below string which is actually a name in db, we make some api call and we get this name:

株式会社エス・ダブリュー・コミュニケーションズ

While saving through our code (as in servlet - hibernate - database), we get an error:

Caused by: java.sql.BatchUpdateException: ORA-12899: value too large for column "NAME_ON_ACCOUNT" (actual: 138, maximum: 100)

this is 23 characters but looks like it's taking 6 bytes per character, that would only make it 138.

Below code gives me 69:

byte[] utf8Bytes = string.getBytes("UTF-8");    
System.out.println(utf8Bytes.length);

And this gives me 92:

byte[] utf8Bytes = string.getBytes("UTF-32");
System.out.println(utf8Bytes.length);

I will surely check NLS_CHARACTERSET and see the IO classes but have you ever seen a character taking 6 bytes? Any help will be much appreciated.

like image 874
pankaj gambhir Avatar asked Apr 02 '13 18:04

pankaj gambhir


1 Answers

It probably holds HTML entities in a string. Like 燃 or possibly the URL style, %8C%9A. Or maybe UTF7, like [Ay76b. (I made up those values, but your actual ones will be similar). It is always a pain to rely on any framework with character encoding because its authors were likely U.S. or European, both sufficing with simple ANSI where one byte equals one character. If you managed to understand your encoding and converted it to the real UTF8 or even UTF16, it would take up less space in this particular case.

like image 173
Zdenek Avatar answered Sep 24 '22 23:09

Zdenek