On a page of Java's Bug Database http://bugs.sun.com/view_bug.do?bug_id=4508058 it reads that Sun/Oracle will not fix the problem of Java not parsing the BOM of a UTF-8-encoded string. Since the most recent comment on this page dates back to 2010, I would like to know if there is any younger info about that? Is it still true that Java cannot handle BOM of UTF-8?
Yes, it is still true that Java cannot handle the BOM in UTF8 encoded files. I came across this issue when parsing several XML files for data formatting purposes. Since you can't know when you might come across them, I would suggest stripping the BOM marker out if you find it at runtime or following the advice that tchrist gave.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With