Is git worth for managing many files bigger than 500MB

Q: What file size is too big for Git?

File size limits GitHub limits the size of files allowed in repositories. If you attempt to add or update a file that is larger than 50 MB, you will receive a warning from Git.

Q: How many files can Git handle?

There is no real limit -- everything is named with a 160-bit name. The size of the file must be representable in a 64 bit number so no real limit there either. There is a practical limit, though. I have a repository that's ~8GB with >880,000 files and git gc takes a while.

Tags:

git

repository

large-files

I would put under version control a big amount of data, i.e. a directory structure (with depth<=5) with hundreds files with size about 500Mb).

The things I need is a system that help me: - to detect if an files has been changed - to detect if files were added/removed - to clone the entire repository in another location - to store a "checkpoint" and restore it later

I don't need sha1 for change detect, something faster is acceptable.

Is git worth for this? There is a better alternative?

804

asked Nov 19 '09 10:11

Andrea Francia

2 Answers

As I mentioned in "What are the Git limits", Git is not made to manage big files (or big binary files for that matter).

Git would be needed if you needed to:

know what has actually changed within a file. But for the directory-level, the other answers are better (Unison or rsynch)
keep a close proximity (i.e. "same referential") between your development data, and those large resources. Having only one referential would help, but then you would need a fork of Git, like git-bigfiles to efficiently manage them.

Note: still using Git, you can try this approach

Unfortunately, rsync isn't really perfect for our purposes either.

First of all, it isn't really a version control system. If you want to store multiple revisions of the file, you have to make multiple copies, which is wasteful, or xdelta them, which is tedious (and potentially slow to reassemble, and makes it hard to prune intermediate versions), or check them into git, which will still melt down because your files are too big.

Plus rsync really can't handle file renames properly - at all.

Okay, what about another idea: let's split the file into chunks, and check each of those blocks into git separately.
Then git's delta compression won't have too much to chew on at a time, and we only have to send modified blocks...

Based on gzip --rsyncable, with a POC available in this Git repo.

answered Nov 15 '22 18:11

VonC

git-annex is a solution to this problem. Rather than storing the large file data directly in git, it stores it in a key/value store. Symlinks to the keys are then checked into git as a proxy for the actual large files.

http://git-annex.branchable.com

answered Nov 15 '22 20:11

Joey

Related questions
                            
                                Eclipse EGIT - all committed, pulled, merged, marked as merged, still on push I get "rejected - non-fast forward", what am I missing?
                            
                                Cannot push to remote git repository
                            
                                View git changes/diffs of local commits not pushed to remote
                            
                                Git: "wildcard refspec" with no match on remote, when pulling
                            
                                How do I solve a folder capitalization conflict with Git on Windows?
                            
                                filter-branch --index-filter always failing with "fatal: bad source"
                            
                                How to accommodate multiple coding styles? (git vs. IDE)
                            
                                Jenkins unable to run Maven
                            
                                Git merge one branch into two other branches
                            
                                Git difftool not launching external DiffMerge program
                            
                                Installing laravel on existing project
                            
                                Using Git in R-studio: cannot stage modified code files
                            
                                Git init doesn't create master branch
                            
                                Using "KexAlgorithms diffie-hellman-group1-sha1" did not resolve "no matching key exchange method found" error
                            
                                How can I change the url for a project in GitLab?
                            
                                Git - When to use force push [closed]
                            
                                Host static website with GitLab pages
                            
                                how to update one submodule git?
                            
                                ! [remote rejected] master -> master (failure) error: failed to push some refs to ‘repository URL’
                            
                                "fatal: Authentication failed for" when pushing to GitHub from Visual Studio Code (1.62.2)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With