Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is my Git repository so much bigger than the working directory?

Tags:

git

I just created a new repository and created the initial commit.

The working directory is 2 GB. But the .git directory is a whopping 15 GB.

Why is the git repository, with only a single commit, almost 8 times as large as the working directory?

Am I doing something wrong? Is there any way to fix this?

like image 782
PortMan Avatar asked Dec 09 '15 01:12

PortMan


People also ask

Why is Git repository so big?

As I told, git keeps track of each and every line change you made, it makes its history huge. But git uses a powerful compressing mechanism so that it will make all your codes to tiny tiny chunks. Also it stores the difference in between the files to reduce the size.

How big is too big for a Git repo?

Note: If you add a file to a repository via a browser, the file can be no larger than 25 MB. For more information, see "Adding a file to a repository." GitHub blocks files larger than 100 MB. To track files beyond this limit, you must use Git Large File Storage (Git LFS).


1 Answers

The big repository size is because you added the contents of the ".hg" subdirectory temporarily, but did not use the data in the actual initial commit. Let's trace what happened step by step:

  1. git init: Creates a ".git" subdirectory with a small bit of metadata.

  2. git add .: This copied all of the working tree into Git's index (a.k.a. staging area) - in other words all the files in your project, including all of the ".hg" directory files. When we say that these have been added to the index, it means all the file contents have been added to the object storage database in ".git/objects", and the ".git/index" file has pointers to all the files.

  3. git reset .hg: This removed the ".hg" subdirectory from the index. But the objects that have been added to the storage are not removed, because other commits or index entries might have pointed to them. (Git currently does not track how many references point to an object. It operates with tracing garbage collection, not reference counting.)

  4. git commit: This is the last command you performed, which copied the index into a new commit and stored that into the repository.

To address your problem:

  • You can avoid the file bloat in the first place if you start with a blank repository and only add the files that you need, carefully excluding the ".hg".

  • If you want to fix the problem after the fact, you can run git gc and hope that the unused objects are removed.

like image 52
Nayuki Avatar answered Oct 06 '22 16:10

Nayuki