Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Working with utf-8 files in Eclipse

Quite straight forward question. Is there a way to configure Eclipse to work with text files encoded with utf-8 with and without the BOM?

So far I've used eclipse with utf-8 encoding and it works, but when I try to edit a file generated by another editor that includes the BOM, Eclipse doesn't handle it properly, it 'shows an invisible character' at the begining of the file (the BOM). Is there a way to make Eclipse understand utf-8 encoded files with BOM?

like image 466
Pablo Cabrera Avatar asked May 25 '10 14:05

Pablo Cabrera


People also ask

How do I change the encoding of a properties file in eclipse?

properties files are Latin1 (ISO-8859-1) encoded by definition. ISO-8859-1 as its default encoding. You can change this under: Preferences > General > Content Types.

How do I change the default encoding in Eclipse?

In Eclipse, go to Preferences>General>Workspace and select UTF-8 as the Text File Encoding. This should set the encoding for all the resources in your workspace. Any components you create from now on using the default encoding should all match.

Are Java strings UTF-8?

A Java String is internally always encoded in UTF-16 - but you really should think about it like this: an encoding is a way to translate between Strings and bytes.


1 Answers

Both bug 78455 ("Provide an option to force writing a BOM to UTF-8 files") and bug 136854 don't leave much hope for such an option.

The support for encoding in the workspace is based on what is available from Java.
For any given resource in the workspace, it is possible to obtain a charset string that can be used with any Java APIs that take charset strings.
Examples are:

  • 'US-ASCII',
  • 'UTF-8',
  • 'Cp1252',
  • 'UTF-16' (Big Endian, BOM inserted automatically),
  • 'UTF-16BE' (Big Endian, BOM not inserted automatically),
  • 'UTF-16LE' (Little Endian, BOM not inserted automatically).

For Java encodings, except for the 'UTF-16' encoding, BOMs are not inserted (when writing) or discarded (when reading) for free.
Even if this is puzzling to end users, this is how all Java applications work.
If applications want to support creating UTF-8 files with BOMs to match their users' expectations, they need to provide such capability on their own (as neither Java nor the Resources model will help with that).
Eclipse does provide some improvements towards detecting BOMs, but not with generating or skipping them.

like image 117
VonC Avatar answered Oct 07 '22 18:10

VonC