Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing the BOM character with Java [duplicate]

I am trying to read files using FileReader and write them into a separate file.
These files are UTF-8 encoded, but unfortuantely some of them still contain a BOM.
The relevant code I tried is this:

private final String UTF8_BOM = "\uFEFF";

 private String removeUTF8BOM(String s)
    {
        if (s.startsWith(UTF8_BOM))
        {
            s=s.replace(UTF8_BOM, "");
        }
        return s;
    }

    line=removeUTF8BOM(line);

But for some reason the BOM is not removed. Is there any other way I can do this with FileReader? I know that there is the BOMInputStream that should work, but I'd rather find a solution using FileReader.

like image 632
randomQuestion Avatar asked Mar 18 '23 06:03

randomQuestion


1 Answers

The class FileReader is an old utility class, that uses the platform encoding. On Windows that is likely not UTF-8.

Best to read with another class.

As amusement, and to clarify the error, here a dirty hack, that works for platforms with single byte encodings:

private final String UTF8_BOM = new String("\uFEFF".getBytes(StandardCharsets.UTF_8));

This gets the UTF-8 bytes and makes a String in the current platform encoding.

No need to mention that FileReader is non-portible, dealing only with local files.

like image 133
Joop Eggen Avatar answered Mar 26 '23 03:03

Joop Eggen