Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Git and binary data

Tags:

git

I'm currently starting to use git for my version control system, however I do a fair bit of web/game development which of course requires images(binary data) to be stored. So if my understanding is correct if I commit an image and it changes 100 times, if I fetch a fresh copy of that repo I'd basically be checking out all 100 revisions of that binary file?

Is this not an issue with large repo's where images change regularly wouldn't the initial fetch of the repo end up becoming quite large? Has anybody experienced any issue's with this in the real world? I've seen a few alternatives for instance, using submodules and keeping images in a separate repo but this only keeps the codebase smaller, the image repo would still be huge. Basically I'm just wondering if there's a nice solution to this.

like image 821
Jamie Avatar asked Dec 15 '09 22:12

Jamie


People also ask

Can you use Git with binary files?

Git LFS is a Git extension used to manage large files and binary files in a separate Git repository. Most projects today have both code and binary assets. And storing large binary files in Git repositories can be a bottleneck for Git users.

Does Git differ from binary files?

Any binary format can be diffed with git, as long as there's a tool which converts the binary format to plain text. One just needs to add the conversion handlers and attributes in the same way.

Can I store binary in GitHub?

If you have only a few binaries or zip file only, you can upload them to github via Downloads -> Upload a new file. This feature is quite limited though, you cannot put files in structured folders.

Should I commit binary files Git?

It's important to never commit binary files because once you've commit them they are in the repository history and are very annoying to remove. You can delete the files from the current version of the project - but they'll remain in the repository history, meaning that the overall repository size will still be large.


1 Answers

I wouldn't call that "checkout", but yes, the first time you fetch repository, provided that binary data is huge and incompressible it's going to be what it is - huge. And yes, since conservation law is still in effect breaking it into modules won't save you space and time on initial pulling of repository.

One possible solution is still using separate repository and --depth option when pulling it. Shallow repositories have some limitations, but I don't remember what exactly, since I never used it. Check the docs. Keyword is "shallow".

Edit: From git-clone(1):

A shallow repository has a number of limitations (you cannot clone or fetch from it, nor push from nor into it), but is adequate if you are only interested in the recent history of a large project with a long history, and would want to send in fixes as patches.

like image 157
Michael Krelin - hacker Avatar answered Sep 28 '22 05:09

Michael Krelin - hacker