Is it possible to store only a checksum of a large file in git?

Question

I'm a bioinformatician currently extracting normal-sized sequences from genomic files. Some genomic files are large enough that I don't want to put them into the main git repository, whereas I'm putting the extracted sequences into git.

Is it possible to tell git "Here's a large file - don't store the whole file, just take its checksum, and let me know if that file is missing or modified."

If that's not possible, I guess I'll have to either git-ignore the large files, or, as suggested in this question, store them in a submodule.

Scott Chacon · Accepted Answer

I wrote a script that does this sort of thing. You put file patterns in the .gitattributes file for large media that you don't want going in your git repo and it can store them on S3 instead. It's just a starting point, but I think it's usable if you're interested.

http://github.com/schacon/git-media

Maybe that will help you, or at least show you how it could be done and you can customize it for your specific needs.

http://github.com/schacon/git-media

Maybe that will help you, or at least show you how it could be done and you can customize it for your specific needs.

Is it possible to store only a checksum of a large file in git?

Tags:

git

large-files

Andrew Grimm

1 Answers

Scott Chacon

Recent Activity

Donate For Us

Is it possible to store only a checksum of a large file in git?

Tags:

git

large-files

Andrew Grimm

1 Answers

Scott Chacon

Related questions

Recent Activity

Donate For Us