Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does keeping (large-ish) binary files in a Git repository affect performance for operations besides cloning?

Tags:

git

I've read the existing questions about storing binary files in a Git repository, but some aspects are still not clear.

The repository contains around 50 MB of code sources and around 1 GB of binary files. The binary files are seldom changed.

  1. Is performance of the usual daily workflow affected negatively by the binary files? Operations like commiting changed, moved, moved and changed files; merging; pulling and pushing. The operations in question don't involve said binary files.
  2. From a performance point of view (e.g. RAM, CPU, HDD access) is there any merit to removing these files from select branches? As opposed to completely removing the files from the repository and its history.
like image 515
Vsevolod Golovanov Avatar asked Jul 13 '15 10:07

Vsevolod Golovanov


2 Answers

If the files are never involved, it doesn't make any difference in terms of performance.
Each commit marks the modified files, so when a commit it's being applied the files that are not tagged in it they don't not really matter, whether they are 1Kb or 1 Gb. If the file appears in a commit it will obviously matter, as typically binary files are slower to deal with.
Now, the main problem is that cloning a repository is not the only action that involves applying commits. For instance when you change to a different branch git has to remove the application of all commits until the common one, and then apply all commits of the other branch until reaching the desired checkout commit, or when merging or rebasing git has to analyse all commits to find the differences.
Basically whenever a commit containing modifications on a binary file has to be read by git, performance will very likely be affected, and because of the way git works, commits get "used" quite often.
About your question, it basically depends on what you mean by "seldom changed". As long as the branches you typically work on don't have modifications on binary files this shouldn't be a problem, but if you have modifications to track when checking out different commits, performance gets affected.

like image 75
Juan Avatar answered Oct 07 '22 18:10

Juan


It can influence the operations like git gc or git repack, where deltification is done. See "Are Git's pack files deltas rather than snapshots?".

That is why I generally stored in version control only a text file declaring where to find the binaries I need, as opposed to storing the binaries themselves. See "git include compiled dll from another repository" as an example.

like image 34
VonC Avatar answered Oct 07 '22 18:10

VonC