Right way to deal with Unicode BOM in a text file

Question

I am reading a text file in my program which contains some Unicode BOM character \ufeff/65279 in places. This presents several issues in further parsing.

Right now I am detecting and filtering these characters myself but would like to know if Java standard library or Guava has a way to do this more cleanly.

Boris the Spider · Accepted Answer

There is no built in way of dealing with a (UTF-8) BOM in Java or, indeed, in Guava.

There is currently a bug report on the Guava website about dealing with a BOM in Guava IO.

There are several SO posts (here and here) on how to detect/skip the BOM while reading a file in plain Java.

Your BOM (\ufeff) seems to be UTF-16 which, according to the same Guava report should be dealt with automatically by Java. This SO post seems suggest the same.

Right way to deal with Unicode BOM in a text file

Tags:

java

file-io

character-encoding

guava

missingfaktor

1 Answers

Boris the Spider

Recent Activity

Donate For Us

Right way to deal with Unicode BOM in a text file

Tags:

java

file-io

character-encoding

guava

missingfaktor

1 Answers

Boris the Spider

Related questions

Recent Activity

Donate For Us