Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do I really need to specify all binary files in .gitattributes

I've read Git documentation that shows that I can explicitly set certain files to be treated as text, so their line endings are automatically changed or as binary to ensure that they are untouched.

However, I have also read that Git is pretty good at detecting binary files, which makes me thing this is not needed. So my question is do I really need to specify these explicit settings for every single file extension in my repository? I've seen some recommend to do so for all image file extensions.

# Set the default behavior, in case people don't have core.autocrlf set.
* text=auto

# Explicitly declare text files you want to always be normalized and converted
# to native line endings on checkout.
*.c text
*.h text

# Denote all files that are truly binary and should not be modified.
*.png binary
*.jpg binary
like image 765
Muhammad Rehan Saeed Avatar asked Jul 14 '19 19:07

Muhammad Rehan Saeed


People also ask

Should binary files be stored in git?

You should use Git LFS if you have large files or binary files to store in Git repositories. That's because Git is decentralized. So, every developer has the full change history on their computer.

Are all files stored in binary?

All files are stored as binary, but some (text files) are stored with a sufficiently simple binary format (a text encoding) that any of a very wide range of programs can display at least a basically correct and useful rendering of the contents that can be edited by straightforward keyboard usage.

Can git compare binary files?

Any binary format can be diffed with git, as long as there's a tool which converts the binary format to plain text. One just needs to add the conversion handlers and attributes in the same way.

What is the point of binary files?

Binary files can be used to store any data; for example, a JPEG image is a binary file designed to be read by a computer system. The data inside a binary file is stored as raw bytes, which is not human readable.


2 Answers

Git will check the first 8,000 bytes of a file to see if it contains a NUL character. If it does, the file is assumed to be binary.

From git's source code:

#define FIRST_FEW_BYTES 8000
int buffer_is_binary(const char *ptr, unsigned long size)
{
    if (FIRST_FEW_BYTES < size)
        size = FIRST_FEW_BYTES;
    return !!memchr(ptr, 0, size);
}

For text files, unless you intentionally insert a NUL character for some reason, they'll be correctly guessed. For binaries, it's more than likely that the first 8,000 bytes will contain at least a single instance.

For the most part, you shouldn't need to declare a file's type explicitly (I don't think I ever have). Realistically, just declare a specific file if you run into an issue.

like image 163
jhpratt Avatar answered Oct 24 '22 02:10

jhpratt


Git is, in general, good about detecting whether a file is text or binary, and so you may not explicitly need to set anything. Setting a default of * text=auto is a good idea regardless, as you point out.

However, if you or anyone working on the project is working with files in UTF-16, it's a very good idea to explicitly set the text attribute on those files, as well as the working-tree-encoding attribute, since Git will notice the NUL bytes in them and think of them as binary.

You should also specify any file type as binary that you think might be misdetected as text. For example, if you have some image format or file that consists only of printable ASCII bytes, Git might misdetect that as text. You'd want to specify those files explicitly to avoid confusion. Only you would know which files in your repository are likely to hit that issue.

like image 37
bk2204 Avatar answered Oct 24 '22 01:10

bk2204