Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Advantage to Git LFS when large files don't change?

Tags:

git

git-lfs

I'm considering using Git LFS for a repository that will contain ISO and installer files that are used by our system image build tools (in this case Packer). We'll then add it as a submodule of our main repo that has the build scripts so it can be integrated into our CI toolchain.

As I understand Git LFS, the large files are replaced with a pointer so repo pulls and maintenance are quick and then the files are downloaded over a different channel.

However, when we add the files they'll have the version number in the name so they won't need to be updated (e.g. ubuntu-16.04.4-server-amd64.iso). They also won't need to be removed because we'll reference specific versions by that full name in build scripts. We'll basically always be adding and rarely (if ever) updating or deleting.

It seems like Git LFS is mainly for updating / deleting. Are there any remaining technical advantages for our use case?

like image 479
N Jones Avatar asked Jan 28 '23 04:01

N Jones


1 Answers

It seems like Git LFS is mainly for updating / deleting.

Git-LFS is mostly to keep the repository size down. git clone normally downloads the entire repository, so git-lfs mostly affects clone. The repository includes all files and all versions of those files, including the deleted ones.

If you make a minor Ubuntu update and git rm ubuntu-16.04.4-server-amd64.iso and git add ubuntu-16.04.5-server-amd64.iso now you're storing two ISOs. Another update and it's three. Then four. Five. Six. Without git-lfs, everyone has to download and store all of those old deleted ISOs.

If you're going to store large files like Operating System ISOs or media files they will rapidly bloat the size of the repository. This means anyone cloning your repository will have to spend the time and bandwidth to download everything, and spend disk space on everything. This bloats your development process and makes people hesitant to download a 20 gig repository just to work on a few text files.

Are there any remaining technical advantages for our use case?

Yes. There's little cost to using git-lfs. That cost is lowest if you use it sooner rather than later.

You can use git-lfs later, but there's some strings attached. If you use it on existing files they'll be in git-lfs going forward, but their old versions will still be in history. You can use the BFG to rewrite history to retroactively put existing large files into git-lfs, but rewriting your entire history is not something you want to be doing often. You should probably use git-lfs sooner rather than later.

Here is a good run-down about what it takes to switch over later.

Using git-lfs early means developers don't have to think hard about whether to put something into the repository just because it's too big. If there's something they feel should be in version control they put it in version control, regardless of size. This simplifies the developer's decision making process and makes for a healthier repository. If you need to, say, have six different Operating System ISOs in the repository for testing they can do that without a debate about repository bloat.

It also means you don't have to do work arounds to account for repository bloat. There are various means to only clone part of a repository, but they all add complexity. There are means to let Git store compressed ISOs and archives more efficiently, you unpack them and let Git store them as normal files, but again that adds complexity. git-lfs means you can keep things simple(r).

Finally, the storage side of git-lfs is flexible. You're not beholden to Github or any particular Git site for LFS storage.

like image 94
Schwern Avatar answered Feb 04 '23 09:02

Schwern