I'm a bioinformatician currently extracting normal-sized sequences from genomic files. Some genomic files are large enough that I don't want to put them into the main git repository, whereas I'm putting the extracted sequences into git.
Is it possible to tell git "Here's a large file - don't store the whole file, just take its checksum, and let me know if that file is missing or modified."
If that's not possible, I guess I'll have to either git-ignore the large files, or, as suggested in this question, store them in a submodule.
I wrote a script that does this sort of thing. You put file patterns in the .gitattributes file for large media that you don't want going in your git repo and it can store them on S3 instead. It's just a starting point, but I think it's usable if you're interested.
http://github.com/schacon/git-media
Maybe that will help you, or at least show you how it could be done and you can customize it for your specific needs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With