Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Encoding of a java source file

When I commit a java source file to GitHub and create a pull request, it shows the whole file as a diff. When "Hide whitespace changes" is selected on the diff screen, the problem is solved.

Could this be related to file encoding? notepad++ shows ANSI for the same file in both branches. BeyondCompare shows only changed lines as diff, unlike Github.

As a more general question, do .java files contain an encoding header? Is there a single specific encoding assigned to each file?

Thanks.

like image 825
Saim Doruklu Avatar asked May 26 '26 20:05

Saim Doruklu


2 Answers

Newline differences are not (usually) related to encoding differences, it's more subtle.

A UTF-8 encoded file on Windows might end up with newlines that are represented as \r\n (also known as CRLF) while a UTF-8 encoded file on a Unix-like OS might end up with just \n (also known as just LF).

This difference is likely to be the cause of your whole-file-diff and can be fixed in different ways:

  • either force git to always use LF no matter what the editors write or
  • configure all editors to use consistent line endings (depends on which editors/IDEs you use).
like image 166
Joachim Sauer Avatar answered May 28 '26 10:05

Joachim Sauer


These differences usually have at least one of the following reasons:

Source encoding

Java does not have an encoding defined in the source file, so you have to agree with your team members. Usually there's no good reason to choose anything else than UTF-8. If your preferred editor only supports system encoding... choose another one.

Line endings

Git supports a local configuration of LF handling by setting the core.autocrlf config. I strongy recommend not to use it. Make a project wide configuration and place a .gitattributes file in your project root, for example:

# Set the default behavior, in case people don't have core.autocrlf set.
* text=auto

# Explicitly declare text files you want to always be normalized and converted
# to native line endings on checkout.
*.java text

# Declare files that will always have Unix LF line endings on checkout.
*.sh text eol=lf
Dockerfile text eol=lf

Indentation

A mixture of tabs an whitespaces can really mess up your commit history. Many IDEs do an automatic formatting or pretty printing on every save action, but they all do it slightly different. Agree with your team members on a common source formatting (and be prepared for longer discussions).

If you cannot even agree on tabs or spaces, here's a good argument: Developers Who Use Spaces Make More Money Than Those Who Use Tabs

like image 23
oliver_t Avatar answered May 28 '26 11:05

oliver_t



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!