I am reading a text file in my program which contains some Unicode BOM character \ufeff
/65279
in places. This presents several issues in further parsing.
Right now I am detecting and filtering these characters myself but would like to know if Java standard library or Guava has a way to do this more cleanly.
There is no built in way of dealing with a (UTF-8) BOM in Java or, indeed, in Guava.
There is currently a bug report on the Guava website about dealing with a BOM in Guava IO.
There are several SO posts (here and here) on how to detect/skip the BOM while reading a file in plain Java.
Your BOM (\ufeff
) seems to be UTF-16 which, according to the same Guava report should be dealt with automatically by Java. This SO post seems suggest the same.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With