Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Right way to deal with Unicode BOM in a text file

I am reading a text file in my program which contains some Unicode BOM character \ufeff/65279 in places. This presents several issues in further parsing.

Right now I am detecting and filtering these characters myself but would like to know if Java standard library or Guava has a way to do this more cleanly.

like image 748
missingfaktor Avatar asked Dec 26 '22 05:12

missingfaktor


1 Answers

There is no built in way of dealing with a (UTF-8) BOM in Java or, indeed, in Guava.

There is currently a bug report on the Guava website about dealing with a BOM in Guava IO.

There are several SO posts (here and here) on how to detect/skip the BOM while reading a file in plain Java.

Your BOM (\ufeff) seems to be UTF-16 which, according to the same Guava report should be dealt with automatically by Java. This SO post seems suggest the same.

like image 165
Boris the Spider Avatar answered Dec 31 '22 12:12

Boris the Spider