Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rejecting large files in git

We have recently started using git and had a nasty problem when someone committed a large (~1.5GB file), that then caused git to crash on various 32bit OSes. This seems to be a known bug (git mmaps files into memory, which doesn't work if it can't get enough contingous space), which isn't going to get fixed any time soon.

The easy (for us) solution would be to get git to reject any commits larger than 100MB or so, but I can't figure out a way to do that.

EDIT: The problem comes from accidental submission of large file, in this case a large dump of program output. The aim is to avoid accidental submission, just because if a developer does accidentally submit a large file, trying to then get it back out the repository is an afternoon where no-one can do any work, and has to fix up all local branches they have.

like image 404
Chris Jefferson Avatar asked May 13 '09 14:05

Chris Jefferson


People also ask

How do I bypass the file size limit in GitHub?

GitHub blocks files larger than 100 MB. To track files beyond this limit, you must use Git Large File Storage (Git LFS). For more information, see "About Git Large File Storage." If you need to distribute large files within your repository, you can create releases on GitHub.com instead of tracking the files.

How does git handle large files?

When you have source files with large differences between versions and frequent updates, you can use Git LFS to manage these file types. Git LFS is an extension to Git which commits data describing the large files in a commit to your repo, and stores the binary file contents into separate remote storage.


1 Answers

When exactly did the problem occur? When they committed the file originally or when it got pushed elsewhere? If you have a staging repo that everyone pushes to, you could implement an update hook to scan changing refs for large files, along with other permissions etc checking.

Very rough and ready example:

git --no-pager log --pretty=oneline --name-status $2..$3 -- | \
  perl -MGit -lne 'if (/^[0-9a-f]{40}/) { ($rev, $message) = split(/\s+/, $_, 2) }
     else { ($action, $file) = split(/\s+/, $_, 2); next unless $action eq "A"; 
       $filesize = Git::command_oneline("cat-file", "-s", "$rev:$file");
       print "$rev added $file ($filesize bytes)"; die "$file too big" if ($filesize > 1024*1024*1024) }';

(just goes to show, everything can be done with a Perl one-liner, although it might take multiple lines ;))

Called in the way that $GIT_DIR/hooks/update is called (args are ref-name, old-rev, new-rev; e.g. "refs/heads/master master~2 master") this will show the files added and abort if one is added that is too big.

Note that I'd say that if you're going to police this sort of thing, you need a centralised point at which to do it. If you trust your team to just exchange changes with each other, you should trust them to learn that adding giant binary files is a bad thing.

like image 56
araqnid Avatar answered Sep 27 '22 19:09

araqnid