Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Western European Characterset to Turkish in sql

I am having a serious issue with character encoding. To give some background:

  1. I have turkish business users who enter some data on Unix screens in Turkish language.
  2. My database NLS parameter is set to AMERICAN, WE8ISO8859P1 and Unix NLS_LANG to AMERICAN_AMERICA.WE8ISO8859P1.
  3. Turkey business is able to see all the turkish characters on UNIX screens and TOAD while I'm not. I can only see them in Western European Character set.

At business end: ÖZER İNŞAAT TAAHHÜT VE
At our end : ÖZER ÝNÞAAT TAAHHÜT VE

If you notice the turkish characters İ and Ş are getting converted to ISO 8859-1 character set. However, all the settings(NLS paramaters in db and unix) are same at both end- ISO8859-1(Western European)

With some study, I can understand - Turkish machines can display turkish data by doing conversion in real-time(DB NLS settings are overridden by local NLS settings).

Now, I have a interface running in my db- have some PL/SQL scripts(run through shell script) that extracts some data from database and spool them to a .csv file on a unix path. Then that .csv file is transferred to an external system via MFT(Managed File transfer).

The problem is- Exract never conains any turkish character. Every turkish character is getting converted into Western European Characterset and goes like this to the external system which is treated as a case of data conversion/loss and my business is really unhappy.

Could anyone tell me - How could I retain all the turkish characters?

P.S. : External System's characterset could be set to ISP8859-9 charcterset.

Many thanks in advance.

like image 377
prashant1988 Avatar asked Jan 26 '26 12:01

prashant1988


1 Answers

If you are saying that your database character set is ISO-8859-1, i.e.

SELECT parameter, value
  FROM v$nls_parameters
 WHERE parameter = 'NLS_CHARACTERSET'

returns a value of WE8ISO8859P1 and you are storing the data in CHAR, VARCHAR, or VARCHAR2 columns, the problem is that the database character set does not support the full set of Turkish characters. If a character is not in the ISO-8859-1 codepage layout, it cannot be stored properly in database columns governed by the database character set. If you want to store Turkish data in an ISO-8859-1 database, you could potentially use the workaround characters instead (i.e. substituting S for Ş). If you want to support the full range of Turkish characters, however, you would need to move to a character set that supported all those characters-- either ISO-8859-9 or UTF-8 would be relatively common.

Changing the character set of your existing database is a non-trivial undertaking, however. There is a chapter in the Globalization Support Guide for whatever version of Oracle you are using that covers character set migration. If you want to move to a Unicode character set (which is generally the preferred approach rather than sticking with one of the single-byte ISO character sets), you can potentially leverage the Oracle Database Migration Assistant for Unicode.

At this point, you'll commonly see the objection that at least some applications are seeing the data "correctly" so the database must support the Turkish characters. The problem is that if you set up your NLS_LANG incorrectly, it is possible to bypass character set conversion entirely meaning that whatever binary representation a character has on the client gets persisted without modification to the database. As long as every process that reads the data configures their NLS_LANG identically and incorrectly, things may appear to work. However, you will very quickly find that some other application won't be able to configure their NLS_LANG identically incorrectly. A Java application, for example, will always want to convert the data from the database into a Unicode string internally. So if you're storing the data incorrectly in the database, as it sounds like you are, there is no way to get those applications to read it correctly. If you are simply using SQL*Plus in a shell script to generate the file, it is almost certainly possible to get your client configured incorrectly so that the data file appears to be correct. But it would be a very bad idea to let the existing misconfiguration persist. You open yourself up to much bigger problems in the future (if you're not already there) where different clients insert data in different character sets into the database making it much more difficult to disentangle, when you find that tools like the Oracle export utility have corrupted the data that is exported or when you want to use a tool that can't be configured incorrectly to view the data. You're much better served getting the problem corrected early.

like image 195
Justin Cave Avatar answered Jan 28 '26 00:01

Justin Cave



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!