We have an old client talking to a legacy server app. In Turkey, it sends text as windows-1254. We store it and send it back.
A row in the database has a name which is "İ", which is a capital I with a dot on top. In windows-1254, this is a 0xdd character and in UTF-8, that is 0xc4b0.
If I look in the database, I see this:
SQL> select dump(name, 16) from thing where other thing;
DUMP(NAME,16)
--------------------------------------------------------------------------------
Typ=1 Len=2: c3,9d
Weird. As someone pointed out in another question, though...
The "İ" character is 0xdd in windows-1254. And it turns out that 0xdd in windows-1252 is the "Ý" character, which is 0xc39d in UTF-8. Hence what we see dumped.
What we think we want to do is this, but it obviously does not work:
SQL> update thing set name = UTL_RAW.CAST_TO_VARCHAR2(UTL_RAW.CONVERT(HEXTORAW('dd'), 'CP1254', 'UTF8')) where otherthing;
update thing set name = UTL_RAW.CAST_TO_VARCHAR2(UTL_RAW.CONVERT(HEXTORAW('dd'), 'CP1254', 'UTF8')) where otherthing
*
ERROR at line 1:
ORA-06502: PL/SQL: numeric or value error
ORA-06512: at "SYS.UTL_RAW", line 327
Huh?
Lots of doc discusses the UTL_RAW.CONVERT and nobody shows an example of it actually being used. Why is that? Is there a list of the allowed NLS Character Set Names? I cannot find one. Any suggestions? I can find a list of all the encodings that java knows about, but I cannot find this list for Oracle.
I have a java application that I am using to write data that must be read by the legacy software. That app receives UTF-8. I can successfully translate that to windows-1254 bytes. I could translate those to windows-1252. If I then translate those to UTF-8, I can write that to the database with:
SQL> update this set name = UTL_RAW.CAST_TO_VARCHAR2(hextoraw('c39d')) where otherthing;
1 row updated.
SQL> select dump(name, 16) from thing where otherthing;
DUMP(NAME,16)
--------------------------------------------------------------------------------
Typ=1 Len=2: c3,9d
And the client shows this row as "İ". But, you know, wow. That just seems absurd. But if that is what will work, that might be what has to happen....
What's not explained in the docs is that, according to UTL_RAW, a characterset is made up of 3 things; the NLS_LANGUAGE, the NLS_TERRITORY and the characterset itself. To see a list of valid values you can query V$NLS_VALID_VALUES
. The full list of languages and territories is also available in the documentation.
This throws up the first of your problems. According to Oracle Win-1254 is not CP1254 but TR8MSWIN1254. Equally, although the UTF8 characterset exists I suspect your database is set up using AL32UTF8. You can double check by querying NLS_DATABASE_PARAMETERS
.
So, if this is in Turkish, and you're in Turkey let's assume your language and territory is just that TURKISH_TURKEY
.
Prepending this to the characterset now returns what you want:
select utl_raw.convert( hextoraw('dd')
, 'TURKISH_TURKEY.AL32UTF8'
, 'TURKISH_TURKEY.TR8MSWIN1254'
) as raw_char
from dual;
RAW_CHAR
-----------------------------------------------------
C4B0
As you've already noted 0xc4b0 is the representation of İ in UTF-8, so you can use UTL_RAW.CAST_TO_VARCHAR2
as you intended 1:
select utl_raw.cast_to_varchar2(
utl_raw.convert( hextoraw('dd')
, 'TURKISH_TURKEY.AL32UTF8'
, 'TURKISH_TURKEY.TR8MSWIN1254'
)) as new_char
from dual;
1. I don't have a text based Oracle client that can represent this character; sorry!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With