Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a distributed VCS that can manage large files?

Is there a distributed version control system (git, bazaar, mercurial, darcs etc.) that can handle files larger than available RAM?

I need to be able to commit large binary files (i.e. datasets, source video/images, archives), but I don't need to be able to diff them, just be able to commit and then update when the file changes.

I last looked at this about a year ago, and none of the obvious candidates allowed this, since they're all designed to diff in memory for speed. That left me with a VCS for managing code and something else ("asset management" software or just rsync and scripts) for large files, which is pretty ugly when the directory structures of the two overlap.

like image 539
joelhardi Avatar asked Sep 16 '08 08:09

joelhardi


People also ask

Can Git handle large binary files?

Can Git Handle Large Files? Git cannot handle large files on its own. That's why many Git teams add Git LFS to deal with large files in Git.

How does a distributed version control system work?

A distributed version control system (DVCS) brings a local copy of the complete repository to every team member's computer, so they can commit, branch, and merge locally. The server doesn't have to store a physical file for each branch — it just needs the differences between each commit.

What are LFS files?

Git LFS (Large File Storage) is a Git extension developed by Atlassian, GitHub, and a few other open source contributors, that reduces the impact of large files in your repository by downloading the relevant versions of them lazily.

What is git LFS used for?

An open source Git extension for versioning large files. Git Large File Storage (LFS) replaces large files such as audio samples, videos, datasets, and graphics with text pointers inside Git, while storing the file contents on a remote server like GitHub.com or GitHub Enterprise.


1 Answers

It's been 3 years since I asked this question, but, as of version 2.0 Mercurial includes the largefiles extension, which accomplishes what I was originally looking for:

The largefiles extension allows for tracking large, incompressible binary files in Mercurial without requiring excessive bandwidth for clones and pulls. Files added as largefiles are not tracked directly by Mercurial; rather, their revisions are identified by a checksum, and Mercurial tracks these checksums. This way, when you clone a repository or pull in changesets, the large files in older revisions of the repository are not needed, and only the ones needed to update to the current version are downloaded. This saves both disk space and bandwidth.

like image 117
joelhardi Avatar answered Oct 02 '22 15:10

joelhardi