Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find the N largest files in a git repository?

Tags:

git

I wanted to find the 10 largest files in my repository. The script I came up with is as follows:

REP_HOME_DIR=<top level git directory> max_huge_files=10  cd ${REP_HOME_DIR} git verify-pack -v ${REP_HOME_DIR}/.git/objects/pack/pack-*.idx | \   grep blob | \   sort -r -k 3 -n | \   head -${max_huge_files} | \   awk '{ system("printf \"%-80s \" `git rev-list --objects --all | grep " $1 " | cut -d\" \" -f2`"); printf "Size:%5d MB Size in pack file:%5d MB\n", $3/1048576,  $4/1048576; }' cd - 

Is there a better/more elegant way to do the same?

By "files" I mean the files that have been checked into the repository.

like image 646
Sumit Avatar asked Feb 26 '12 19:02

Sumit


People also ask

How do I find large commits in Git?

If you want to see how much space a file occupies if not checked out, you can use %(objectsize:disk) instead of %(objectsize) .

How do I see file size in Git?

You can use either git ls-tree -r -l <revision> <path> to get the blob size at given revision, e.g. The blob size in this example is '16067'.

Where does GitHub store large files?

Storing large files on GitHub You'll need to use something called Git Large File Storage (LFS). Install Git LFS on your computer and then you can begin. In this way you don't need to have all the individual files. This file above stores all the information about each of the large files.

What is the maximum size of a Git repository?

Maximum git repository size is 10GB The total git repository size will be limited to 10GB. Warnings and error messages will be alerted to you as your repository size grows, to help ensure that you are aware of reaching or approaching the size limits.


2 Answers

I found another way to do it:

git ls-tree -r -t -l --full-name HEAD | sort -n -k 4 | tail -n 10 

Quoted from: SO: git find fat commit

like image 196
ypid Avatar answered Oct 08 '22 04:10

ypid


This bash "one-liner" displays the 10 largest blobs in the repository, sorted from smallest to largest. In contrast to the other answers, this includes all files tracked by the repository, even those not present in any branch tip.

It's very fast, easy to copy & paste and only requires standard GNU utilities.

git rev-list --objects --all \ | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' \ | sed -n 's/^blob //p' \ | sort --numeric-sort --key=2 \ | tail -n 10 \ | cut -c 1-12,41- \ | $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest 

The first four lines implement the core functionality, the fifth limits the number of results, while the last two lines provide the nice human-readable output that looks like this:

... 0d99bb931299  530KiB path/to/some-image.jpg 2ba44098e28f   12MiB path/to/hires-image.png bd1741ddce0d   63MiB path/to/some-video-1080p.mp4 

For more information, including further filtering use cases and an output format more suitable for script processing, see my original answer to a similar question.

macOS users: Since numfmt is not available on macOS, you can either omit the last line and deal with raw byte sizes or brew install coreutils.

like image 26
raphinesse Avatar answered Oct 08 '22 06:10

raphinesse