Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find out which files take up the most space in git repo?

Tags:

git

I need to make the repo smaller. I think I can make it smaller by removing problematic binary files from git history:

git filter-branch --index-filter 'git rm --cached --ignore-unmatch BigFile' 

And then releasing the objects:

rm -rf .git/refs/original/ git reflog expire --expire=now --all git gc --aggressive --prune=now 

(Feel free to comment if those commands are wrong.)

The problem: How to identify those big files so that I can asses whether to remove them from git history? Most likely they are not in the working tree anymore - they have been deleted and probably also untracked with:

git rm --cached BigFile 
like image 735
andriej Avatar asked Nov 15 '12 17:11

andriej


People also ask

How do I find large files on GitHub?

If you want to store a large file on GitHub you can. You'll need to use something called Git Large File Storage (LFS). Install Git LFS on your computer and then you can begin. In this way you don't need to have all the individual files.

How do I find large commits in Git?

You would not spot directories or branches containing humongous numbers of small files, for example. So if the script here does not cut it for you (and you have a decently recent version of git), look into git-filter-repo --analyze or git rev-list --disk-usage (examples).

How do I see file size in Git?

You can use either git ls-tree -r -l <revision> <path> to get the blob size at given revision, e.g. The blob size in this example is '16067'.

How do I know my repo size?

Git Repository Size and Preventative Maintenance To find the size of your . git directory, use du – sh . git. You can use git count-objects -v to count the number of unpacked object files and disk space consumed by them.


1 Answers

twalberg's answer does the trick. I wrapped it up in a loop so that you can list files in order by size:

while read -r largefile; do     echo $largefile | awk '{printf "%s %s ", $1, $3 ; system("git rev-list --all --objects | grep " $1 " | cut -d \" \" -f 2-")}' done <<< "$(git rev-list --all --objects | awk '{print $1}' | git cat-file --batch-check | sort -k3nr | head -n 20)" 

head -n 20 restricts the output to the top 20. Change as necessary.

Once you've identified the problem files, check out this answer for how to remove them.

like image 111
MatrixManAtYrService Avatar answered Sep 18 '22 15:09

MatrixManAtYrService