This code, <pre class="prettyprint"><code>OutputStream out = new FileOutputStream(new File("C:/file/test.txt")); out.write("A".getBytes()); </code></pre> And this, <pre class="prettyprint"><code>OutputStream out = new FileOutputStream(new File("C:/file/test.txt")); out.write("A".getBytes(StandardCharsets.UTF_8)); </code></pre> produce the same result(in my opinion), which is UTF-8 without BOM. However, Notepad++ is not showing any information about encoding. I'm expecting notepad++ to show here as <code>Encode in UTF-8 without BOM</code>, but no encoding is being selected in the "Encoding" menu. Now, this code write the file in UTF-8 with BOM encoding. <pre class="prettyprint"><code> OutputStream out = new FileOutputStream(new File("C:/file/test.txt")); byte[] bom = { (byte) 239, (byte) 187, (byte) 191 }; out.write(bom); out.write("A".getBytes()); </code></pre> Notepad++ is also displaying the encoding type as <code>Encode in UTF-8</code>. Question: What is wrong with the first two codes which are suppose to write the file in UTF-8 without BOM? Is my Java code doing the right thing? If so, is there a problem with notepad++ trying to detect the encoding type? Is notepad++ only guessing around?

"A" written using UTF-8 without a BOM produces exactly the same file as "A" written using ASCII or ISO-8859-* or any other ASCII-compatible encodings. That file contains a single byte with the decimal value 65. Think of it this way: <ul> <li> <code>"A".getBytes("UTF-8")</code> returns a <code>new byte[] { 65 }</code> </li> <li> <code>"A".getBytes("ISO-8859-1")</code> returns a <code>new byte[] { 65 }</code> </li> <li>You write the results of those calls into a file</li> <li>How is the consumer of the file supposed to distinguish the two?</li> </ul> There's nothing in that file that suggests that UTF-8 needs to be used to decode it. Try writing "Käsekuchen" or something else that's not encodable in ASCII and see if Notepad++ guesses the encoding correctly (because that's exactly what it does: it makes an educated guess, there's no metadata that tells it which encoding to use).

Writing UTF-8 without BOM

Tags:

java

notepad++

unicode

utf-8

byte-order-mark

This code,

OutputStream out = new FileOutputStream(new File("C:/file/test.txt"));
out.write("A".getBytes());

And this,

OutputStream out = new FileOutputStream(new File("C:/file/test.txt"));
out.write("A".getBytes(StandardCharsets.UTF_8));

produce the same result(in my opinion), which is UTF-8 without BOM. However, Notepad++ is not showing any information about encoding. I'm expecting notepad++ to show here as Encode in UTF-8 without BOM, but no encoding is being selected in the "Encoding" menu.

Now, this code write the file in UTF-8 with BOM encoding.

 OutputStream out = new FileOutputStream(new File("C:/file/test.txt"));
 byte[] bom = { (byte) 239, (byte) 187, (byte) 191 };
 out.write(bom);
 out.write("A".getBytes());

Notepad++ is also displaying the encoding type as Encode in UTF-8.

Question: What is wrong with the first two codes which are suppose to write the file in UTF-8 without BOM? Is my Java code doing the right thing? If so, is there a problem with notepad++ trying to detect the encoding type?

Is notepad++ only guessing around?

551

asked Nov 04 '13 13:11

Mawia

1 Answers

"A" written using UTF-8 without a BOM produces exactly the same file as "A" written using ASCII or ISO-8859-* or any other ASCII-compatible encodings. That file contains a single byte with the decimal value 65.

Think of it this way:

"A".getBytes("UTF-8") returns a new byte[] { 65 }
"A".getBytes("ISO-8859-1") returns a new byte[] { 65 }
You write the results of those calls into a file
How is the consumer of the file supposed to distinguish the two?

There's nothing in that file that suggests that UTF-8 needs to be used to decode it.

Try writing "Käsekuchen" or something else that's not encodable in ASCII and see if Notepad++ guesses the encoding correctly (because that's exactly what it does: it makes an educated guess, there's no metadata that tells it which encoding to use).

145

answered Oct 08 '22 17:10

Joachim Sauer

Related questions
                            
                                Read remote .csv file using opencsv
                            
                                What is the best practice for handling multiple profiles in Spring with java config?
                            
                                Jersey REST WS Error: "Missing dependency for method... at parameter at index X"
                            
                                Difference between 2 collections? (elements in collection1, but not in collection2)
                            
                                How JSP page should check authentication
                            
                                Java - filepath - Invalid escape sequence
                            
                                Garbage collection vs manual memory management
                            
                                not all junit tests are running in eclipse
                            
                                Action TIME_SET in android getting called many times without changing the time manually
                            
                                Could not load a dependent class com/jcraft/jsch/Logger
                            
                                Insert into an already-sorted list
                            
                                IntelliJ + groovy DSL: How to exclude files from being compiled by groovy plugin?
                            
                                Make JList Values Unselectable [duplicate]
                            
                                Fixing Error: Unreported Exception InterruptedException
                            
                                Unable to use VisualVM profiler with Maven Jetty plugin
                            
                                java.lang.NoSuchFieldError: DEF_CONTENT_CHARSET
                            
                                How to find and extract "main" image in website
                            
                                Quartz current executing job when Tomcat is killed
                            
                                hashtable and synchronization in Java
                            
                                What is JavaFX equivalent of JSyntaxPane for making a code editor?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With