Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add a UTF-8 BOM in Java?

I have a Java stored procedure which fetches record from the table using Resultset object and creates a CS Vfile.

BLOB retBLOB = BLOB.createTemporary(conn, true, BLOB.DURATION_SESSION); retBLOB.open(BLOB.MODE_READWRITE); OutputStream bOut = retBLOB.setBinaryStream(0L);  ZipOutputStream zipOut = new ZipOutputStream(bOut); PrintStream out = new PrintStream(zipOut,false,"UTF-8"); out.write('\ufeff'); out.flush();  zipOut.putNextEntry(new ZipEntry("filename.csv")); while (rs.next()){     out.print("\"" + rs.getString(i) + "\"");     out.print(","); } out.flush();  zipOut.closeEntry(); zipOut.close(); retBLOB.close();  return retBLOB; 

But the generated CSV file doesn't show the correct German character. Oracle database also has a NLS_CHARACTERSET value of UTF8.

Please suggest.

like image 434
Fadd Avatar asked Dec 08 '10 15:12

Fadd


People also ask

What is UTF-8 with BOM?

The UTF-8 file signature (commonly also called a "BOM") identifies the encoding format rather than the byte order of the document. UTF-8 is a linear sequence of bytes and not sequence of 2-byte or 4-byte units where the byte order is important. Encoding. Encoded BOM. UTF-8.

How do I view UTF-8 BOM?

To check if BOM character exists, open the file in Notepad++ and look at the bottom right corner. If it says UTF-8-BOM then the file contains BOM character.


2 Answers

BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(...), StandardCharsets.UTF_8)); out.write('\ufeff'); out.write(...); 

This correctly writes out 0xEF 0xBB 0xBF to the file, which is the UTF-8 representation of the BOM.

like image 139
astro Avatar answered Oct 02 '22 14:10

astro


Just in case people are using PrintStreams, you need to do it a little differently. While a Writer will do some magic to convert a single byte into 3 bytes, a PrintStream requires all 3 bytes of the UTF-8 BOM individually:

    // Print utf-8 BOM     PrintStream out = System.out;     out.write('\ufeef'); // emits 0xef     out.write('\ufebb'); // emits 0xbb     out.write('\ufebf'); // emits 0xbf 

Alternatively, you can use the hex values for those directly:

    PrintStream out = System.out;     out.write(0xef); // emits 0xef     out.write(0xbb); // emits 0xbb     out.write(0xbf); // emits 0xbf 
like image 26
Christopher Schultz Avatar answered Oct 02 '22 15:10

Christopher Schultz