I have a CSV file that contains both ASCII & Unicode characters. say "ÅÔÉA". I am not sure abt the encoding format of this file, but when I open it in Notepad, it shows "ANSI" as its encoding standard.
I fetch these contents of CSV in UTF-8 encoded format.
fr = new InputStreamReader(new FileInputStream(fileName),"UTF-8");
but when I store it in DB these special characters, except "A", are not stored properly. the characters get scrambled
I wish all the characters to be stored properly. Any idea?
Once you have encoded something, you need to store it and be able to recall it. Problems with these last two stages are associated with conditions like dementia. But for most younger people, the problem lies in the encoding. Doing too many things at once means we're not able to give proper attention to any one task.
Encoding is a way to convert data from one format to another. String objects use UTF-16 encoding. The problem with UTF-16 is that it cannot be modified. There is only one way that can be used to get different encoding i.e. byte[] array. The way of encoding is not suitable if we get unexpected data.
The native character encoding of the Java programming language is UTF-16. A charset in the Java platform therefore defines a mapping between sequences of sixteen-bit UTF-16 code units (that is, sequences of chars) and sequences of bytes.
"ANSI" in "Notepad" means whatever codepage your windows is using. Try ISO8859-1, it work in most case.
First of all, you need to know the encoding of the file. Open it with a hexeditor. How many byte does a character occupy? If it is only one, then the file is not in UTF-8, but more likely in some ISO-8859 or a similar Windows encoding (e.g. Win-1252). As mentioned before, chances are that ISO-8859-1 is the right encoding. For Eastern Europe languages, ISO-8859-2 would be the right choice.
The second issue would be the character set your database supports for character columns (this parameter is set during installation / creation of a new instance) but since you can insert those characters directly, it wont's be a problem in that case.
Which jdbc driver do you use? The thin driver should not make any problems in that regard, while the OCI driver could create a additional layer of problems if the client's NLS_LANG setting doesn't match the database's character encoding.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With