Reading UTF-8 - BOM marker

Tags:

I'm reading a file through a FileReader - the file is UTF-8 decoded (with BOM) now my problem is: I read the file and output a string, but sadly the BOM marker is outputted too. Why this occurs?

fr = new FileReader(file); br = new BufferedReader(fr);     String tmp = null;     while ((tmp = br.readLine()) != null) {     String text;         text = new String(tmp.getBytes(), "UTF-8");     content += text + System.getProperty("line.separator"); }

output after first line

?<style>

342

asked Feb 04 '11 12:02

onigunn

2 Answers

In Java, you have to consume manually the UTF8 BOM if present. This behaviour is documented in the Java bug database, here and here. There will be no fix for now because it will break existing tools like JavaDoc or XML parsers. The Apache IO Commons provides a BOMInputStream to handle this situation.

Take a look at this solution: Handle UTF8 file with BOM

answered Sep 21 '22 23:09

RealHowTo

The easiest fix is probably just to remove the resulting \uFEFF from the string, since it is extremely unlikely to appear for any other reason.

tmp = tmp.replace("\uFEFF", "");

Also see this Guava bug report

answered Sep 21 '22 23:09

finnw

Related questions
                            
                                What does percolator mean/do in elasticsearch?
                            
                                How to use CompositeDisposable of RxJava 2?
                            
                                Which framework is better CXF or Spring-WS? [closed]
                            
                                Compiling and Running Java Code in Sublime Text 2
                            
                                Is java an open source programming language?
                            
                                Java 8: Parallel FOR loop
                            
                                How to read Excel cell having Date with Apache POI?
                            
                                Why shouldn't I use immutable POJOs instead of JavaBeans?
                            
                                Overriding beans in Integration tests
                            
                                Add multiple items to already initialized arraylist in java
                            
                                Java: What does ~ mean
                            
                                Learning Web Development : Django vs Node vs Rails vs Others [closed]
                            
                                What do curly braces in Java mean by themselves?
                            
                                How to attach source or JavaDoc in eclipse for any jar file e.g. JavaFX?
                            
                                How do I create an empty Stream in Java?
                            
                                Java reflection get all private fields
                            
                                Can I watch for single file change with WatchService (not the whole directory)?
                            
                                Easy way to get a test file into JUnit
                            
                                How to obtain JNI interface pointer (JNIEnv *) for asynchronous calls
                            
                                How to capture arguments passed to a Groovy script?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Reading UTF-8 - BOM marker

Tags:

java

file

encoding

onigunn

People also ask

2 Answers

RealHowTo

finnw

Recent Activity

Donate For Us