Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GIT migrated repo is way smaller than original

Tags:

git

gitlab

I have a repository stored on filesystem that I need to migrate to a HTTPS git repository. The issue is that the migrated repo is smaller that the original, 179M vs 545 MB to be precise.

This is how the original repo looks like:

$ tree -L 2 .git

.git/
├── branches
├── config
├── FETCH_HEAD
├── HEAD
├── hooks
├── index
├── logs
│   └── refs
├── objects
│   ├── incoming_1638816568970138516.pack
│   ├── incoming_2231423675192085195.pack
│   ├── incoming_252567842603709439.pack
│   ├── incoming_2956015230264054740.pack
│   ├── incoming_3048626775278812485.pack
│   ├── incoming_3322152774343971530.pack
│   ├── incoming_3707332777993276763.pack
│   ├── incoming_407171399829023385.pack
│   ├── incoming_4072000993266381297.pack
│   ├── incoming_4293432441900999175.pack
│   ├── incoming_4833572675284287989.pack
│   ├── incoming_4943537936436869872.pack
│   ├── incoming_5555086829860720971.pack
│   ├── incoming_5912835395946639495.pack
│   ├── incoming_7273182803237175093.pack
│   ├── incoming_7510898138918506599.pack
│   ├── incoming_7865231230366160752.pack
│   ├── incoming_8724975206375007218.pack
│   ├── incoming_8787762604831244623.pack
│   ├── incoming_9046531469688239004.pack
│   ├── info
│   └── pack
└── refs
    ├── heads
    ├── remotes
    └── tags


$ git branch -a

  cli
  max
  codefactoring
* master
  new-load-configuration
  new-loader
  plugins_dev
  remotes/origin/cli
  remotes/origin/max
  remotes/origin/codefactoring
  remotes/origin/master

$ du -sh .
545M    .

This is the migration procedure I've followed:

$ mkdir temp_dir && cd temp_dir
$ git clone --mirror /path/to/original/repo
$ cd /path/to/original/repo
$ git remote add new-origin  https://[email protected]/myuser/repo.git
$ git push new-origin --mirror

And then, if I look at the resulting repo size, it's 179MB.

Any idea of what is happening here?

Thank you.

like image 746
Delta Avatar asked Sep 29 '16 11:09

Delta


People also ask

Does git clone affect original?

When you clone a repository, any changes you push to GitHub will affect the original repository.

Why is .git so large?

So, your entire git content will be less than your actual source code size. But, even in that case, you keep on committing large files, your git repo size may increase due to the version history. You have to reduce your git repo size in order to work it seamlessly.

Does deleting branches reduce repository size?

To reduce the size of your repository in GitLab, you must first remove references to large files from branches, tags, and other internal references (refs) that are automatically created by GitLab.


2 Answers

The information stored in the cloned repository is packed before the clone actually starts. That way, it’s perfectly compressed and maintains a small size while containing all information of the original repository.

The original repository however likely evolved over time, so it is possibly fragmented and cannot be packed as efficiently. Maybe it is not completely packed at all but contains still unoptimized objects or even no longer reachable objects.

You could try using git gc (or one of its more aggressive options) on the original repository to see if you can shrink it further.

The bottom line however is that if the clone process completed without errors, then the cloned repository will contain all the information of the original repository. That is, every commit and its data that is reachable using branches or tags will be included. So you should not need to worry about it.

like image 55
poke Avatar answered Oct 17 '22 11:10

poke


I would say that the difference is because your original repository is a non-bare one while the migrated repository is a bare one. Therefore 545MB includes the size of the working tree, which is missing in the migrated repo. Attributing all the size difference (545MB - 179MB = 366MB) to the working tree can be plausible for the following reasons:

  1. Objects in the repository are compressed while in the working tree they are not. Thus in a repository with a short enough history and/or strongly compressible contents the working tree can noticeably exceed the contents of .git.

  2. Working tree may contain untracked files (e.g. build artifacts).

like image 23
Leon Avatar answered Oct 17 '22 10:10

Leon