Eclipse wrong Java properties UTF-8 encoding

Tags:

I have a JavaEE project, in which I use message properties files. The encoding of those file is set to UTF-8. In the file I use the german umlauts like ä, ö, ü. The problem is, sometimes those characters are replaced with unicode like \uFFFD\uFFFD, but not for every character. Now, I have a case where ä and ü are both replaced with \uFFFD\uFFFD, but not for every occurring of ä and ü.

The Git diff shows me something like this:

 mail.adresses=E-Mail hinzufügen: -mail.adresses.multiple=E-Mails durch Kommata getrennt hinzufügen. +mail.adresses.multiple=E-Mails durch Kommata getrennt hinzuf\uFFFD\uFFFDgen.  mail.title=Einladungs-E-Mail  box.preview=Vorschau  box.share.text=Sie können jetzt die ausgewählten Bilder mit Ihren Freunden teilen. @@ -6880,7 +6880,7 @@ browser.cancel=Abbrechen  browser.selectImage=übernehmen  browser.starImage=merken  browser.removeImage=Löschen -browser.searchForSimilarImages=ähnliche +browser.searchForSimilarImages=\uFFFD\uFFFDhnliche  browser.clear_drop_box=löschen

Also, there are lines changed, which I have not touched. I don't understand why I get such a behavior. What could be the cause for the above problem?

My system:

Antergos / Arch Linux

System encoding UTF-8

Python 3.5.0 (default, Sep 20 2015, 11:28:25)  [GCC 5.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.getdefaultencoding() 'utf-8'

Eclipse Mars 1
- Text file encoding UTF-8
- Properties file encoding UTF-8
Tomcat 8
Java JDK 8

If I use another Editor like Atom to edit those message properties files, I don't ran into this problem.

I also realized in a case, if I copy the original value browser.searchForSimilarImages=ähnliche from Git diff and replace the wrong value browser.searchForSimilarImages=\uFFFD\uFFFDhnliche in Eclipse with that, then I have the correct umlauts in the message properties file.

929

asked Jun 30 '15 16:06

BuZZ-dEE

1 Answers

Root cause:

By default ISO 8859-1 character encoding is used for Eclipse properties file (read here), so if the file contains any character beyond ISO 8859-1 then it will not be processed as expected.

Solution 1

If you use Eclipse then you will notice that it implicitly converts the special character into \uXXXX equivalent. Try copying

会意字 / 會意字

into a properties file opened in Eclipse.

EDIT: As per comment from OP

Update the encoding of your Eclipse as shown below. If you set encoding as UTF-32 then even you can see Chinese character, which you cannot see generally.

How to change Encoding of properties file in Eclipse: See this Eclipse Bugzilla bug for more details, which talks about several other possibilities and in the end suggest what I have highlighted below. enter image description here

Chinese characters can be seen in Eclipse after encoding is set properly: enter image description here

Solution 2

If above doesn't work consistently for you (it does work for me and I never see encoding issues) then try this using some Eclipse plugin which handles encoding of properties or other files. For example Eclipse ResourceBundle Editor or Extended Resource-Bundle editor

I would recommend using Eclipse ResourceBundle Editor.

Solution 3

Another possibility to change encoding of file is using Edit --> Set Encoding option. It really matters because it changes the default character set and file encoding. Play around with by changing encoding using Edit --> Set Encoding option and do following Java sysout System.out.println("Default Charset=" + Charset.defaultCharset()); and System.out.println(System.getProperty("file.encoding"));

enter image description here

As an aside: 1

Process the properties file to have content with ISO 8859-1 character encoding by using native2ascii - Native-to-ASCII Converter

What native2ascii does: It converts all the non-ISO 8859-1 character in their equivalent \uXXXX. This is a good tool because you need not to search the \uXXXX equivalent of special character.

Usage for UTF-8: native2ascii -encoding utf8 e:\a.txt e:\b.txt

As an aside: 2

Every computer program whether an IDE, application server, web server, browser, etc. understands only bits, so it need to know how to interpret the bits to make expected sense out of it because depending upon encoding used, same bits can represent different characters. And that's where "Encoding" comes into picture by giving a unique identifier to represent a character so that all computer programs, diverse OS etc. knows exact right way to interpret it.

So, if you have written into a file using some encoding scheme, lets say UTF-8, and then reading using any editor but running with encoding scheme as UTF-8 then you can expect to get correct display.

Please do read my this answer to get more details but from browser-server perspective.

188

answered Oct 04 '22 14:10

hagrawal

Related questions
                            
                                Is this a bug in Files.lines(), or am I misunderstanding something about parallel streams?
                            
                                How Spring Boot Application works internally? [closed]
                            
                                How do I add a type to GWT's Serialization Policy whitelist?
                            
                                C# equivalent to Java's continue <label>?
                            
                                build.xml in Java project
                            
                                Geometry library for Java [closed]
                            
                                ExecutorService, standard way to avoid to task queue getting too full
                            
                                Is regex too slow? Real life examples where simple non-regex alternative is better
                            
                                XMLStreamWriter: indentation
                            
                                Arraylist in parcelable object
                            
                                HashMap Serializability
                            
                                Redirect System.out and System.err to slf4j
                            
                                Array declaration and initialization in Java. Arrays behave differently, when the position of their subscript indices is changed in their declaration
                            
                                UnsupportedClassVersionError on running play application with JDK 1.7
                            
                                MediaPlayer.setDataSource(String) not working with local files
                            
                                Reading from src/main/resources gives NullPointerException
                            
                                Apache Commons Lang 2 vs 3
                            
                                How do I move from Java to C#?
                            
                                How do I add an Implementation-Version value to a jar manifest using Maven?
                            
                                How to acquire a lock by a key

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Eclipse wrong Java properties UTF-8 encoding

Tags:

java

eclipse

utf-8

properties-file