Is there a distributed version control system (git, bazaar, mercurial, darcs etc.) that can handle files larger than available RAM?
I need to be able to commit large binary files (i.e. datasets, source video/images, archives), but I don't need to be able to diff them, just be able to commit and then update when the file changes.
I last looked at this about a year ago, and none of the obvious candidates allowed this, since they're all designed to diff in memory for speed. That left me with a VCS for managing code and something else ("asset management" software or just rsync and scripts) for large files, which is pretty ugly when the directory structures of the two overlap.
Can Git Handle Large Files? Git cannot handle large files on its own. That's why many Git teams add Git LFS to deal with large files in Git.
A distributed version control system (DVCS) brings a local copy of the complete repository to every team member's computer, so they can commit, branch, and merge locally. The server doesn't have to store a physical file for each branch — it just needs the differences between each commit.
Git LFS (Large File Storage) is a Git extension developed by Atlassian, GitHub, and a few other open source contributors, that reduces the impact of large files in your repository by downloading the relevant versions of them lazily.
An open source Git extension for versioning large files. Git Large File Storage (LFS) replaces large files such as audio samples, videos, datasets, and graphics with text pointers inside Git, while storing the file contents on a remote server like GitHub.com or GitHub Enterprise.
It's been 3 years since I asked this question, but, as of version 2.0 Mercurial includes the largefiles extension, which accomplishes what I was originally looking for:
The largefiles extension allows for tracking large, incompressible binary files in Mercurial without requiring excessive bandwidth for clones and pulls. Files added as largefiles are not tracked directly by Mercurial; rather, their revisions are identified by a checksum, and Mercurial tracks these checksums. This way, when you clone a repository or pull in changesets, the large files in older revisions of the repository are not needed, and only the ones needed to update to the current version are downloaded. This saves both disk space and bandwidth.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With