Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

opencsv CSVWriter using utf-8 doesn't seem to work for multiple languages

I have a very annoying encoding problem using opencsv. When I export a csv file, I set character type as 'UTF-8'.

CSVWriter writer = new CSVWriter(new OutputStreamWriter("D:/test.csv", "UTF-8"));

but when I open the csv file with Microsoft Office Excel 2007, it turns out that it has 'UTF-8 BOM' encoding?

Once I save the file in Notepad and re-open, the file turns back to UTF-8 and all the letters in it appears fine. I think I've searched enough, but I haven't found any solution to prevent my file from turning into 'UTF-8 BOM'. any ideas, please?

like image 391
user1213162 Avatar asked Apr 13 '12 06:04

user1213162


2 Answers

I suppose your file has a 'UTF-8 without BOM' encoding. You better feed BOM encoding to your file, even though it's not necessary in most cases, but only one obvious exception is when you deal with ms excel.

FileOutputStream os = new FileOutputStream(file);
os.write(0xef);
os.write(0xbb);
os.write(0xbf);
CSVWriter csvWrite = new CSVWriter(new OutputStreamWriter(os));

Now your file will be understood by excel as utf-8 csv.

like image 122
goodhyun Avatar answered Oct 12 '22 05:10

goodhyun


UTF-8 and UTF-8 Signature (which incorrectly named sometimes as UTF-8 BOM) are same encodings, and signature is used only to distinguish it from any other encodings. Any unicode application should process UTF-8 signature (which is three bytes sequence EF BB BF) correctly.

Why Java is specifically adds this signature and how to stop it doing that I don't know.

like image 29
Petr Abdulin Avatar answered Oct 12 '22 05:10

Petr Abdulin