Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Git repository on server is much bigger than local clone with all branches

Tags:

git

gitblit

We face currently the strange situation, that a repository that is as local clone only 65MB is on the server (GitBlit, but that should not matter) 12 GB in size. I have tried different ideas what could go wrong here, here is the list:

  • Done git ls-tree -r -t -l --full-name HEAD > stats.txt for each branch on the server, and collected that information.
  • Analysed the result with cut -c53-60 <filename> | grep -v '-' | awk '{ sum += $1 } END { print sum }' do summarize all file sizes of all commits.
  • As a result we got ~ 150 MB

So we didn't found any commit with big files in it.

My local directory .git/objects/pack has a pack file with currently 17MB (after a GC, before it was 21MB). The pack files on the server are currently 12 GB in size.

I have cloned the repository in the normal way: git clone https://myserver.mycompancy.com/gitblit/r/projectID/projectID.git and got a local copy. To be sure, I have done then git fetch --all without a change.

So what can we do to find the reason why the pack files on the server are much bigger? GitBlit has an automatic GC running that will pack loose objects older than 7 days.


Update: I have done as recommended the command git verify-pack -v on both my local clone and the server, and here are the results (only as statistic):

  • Lines of result
    • Local: 60,156
    • Server: 16,456,844

So the pack file on the server is a magnitude (~ 270 times) longer which explains alone the difference in the pack. What should be the next steps to find the reason for that many more lines? Is some aspect of the statistic more interesting?

like image 605
mliebelt Avatar asked Jan 15 '16 10:01

mliebelt


1 Answers

See my ticket on GitHub about the problem. Here is a summary what we have done:

  • We have seen that the server repo is much bigger than the client one (> 270 times).
  • We have got some details about the pack file (which is the reason why the server repo is much bigger) by the command git verify-pack -v (thanks to @max360).
  • The size of the result file alone (similar to the size of the pack file itself showed us that there are much more object in the index contained.
  • We don't know the reason for that, and we had thought that GitBlit would reduce it automatically (which it didn'), but after a git gc --prune --agressive, the former 12 GB pack file was shrunken to ~ 110 MB in size.

We have no idea what went wrong so that the repository was bloated, but at least we found a way to shrink it again.

@James Moger explained in the GitHub ticket that doing a GC on GitBlit is an experimental feature, and because JGit is used instead of the Git binary, the result of a GC done by GitBlit may be different to one by the git gc command above.

like image 149
mliebelt Avatar answered Sep 28 '22 05:09

mliebelt