I'm trying to track the size of a project I'm working on. Is there an easy way to get the repository size on disk for different branches?
I tried
git count-objects -v
But it gives the same repository size for each branch.
With Git 2.31 (Q1 2021), "git rev-list
"(man) command learned --disk-usage
option.
It has a lot of examples, but regarding branch size, the command now is:
git rev-list --disk-usage --objects HEAD..<branch_name>
For all branches:
/* Report the disk size of each branch, not including objects used by the
current branch. This can find outliers that are contributing to a
bloated repository size (e.g., because somebody accidentally committed
large build artifacts).
*/
git for-each-ref --format='%(refname)' |
while read branch
do
size=$(git rev-list --disk-usage --objects HEAD..$branch)
echo "$size $branch"
done |
sort -n
See commit a1db097, commit 669b458 (17 Feb 2021), and commit 16950f8, commit 3803a3a (09 Feb 2021) by Jeff King (peff
).
(Merged by Junio C Hamano -- gitster
-- in commit 6fe12b5, 25 Feb 2021)
rev-list
: add --disk-usage option for calculating disk usageSigned-off-by: Jeff King
It can sometimes be useful to see which refs are contributing to the overall repository size (e.g., does some branch have a bunch of objects not found elsewhere in history, which indicates that deleting it would shrink the size of a clone).
You can find that out by generating a list of objects, getting their sizes from cat-file, and then summing them, like:
git rev-list --objects --no-object-names main..branch git cat-file --batch-check='%(objectsize:disk)' | perl -lne '$total += $_; END { print $total }'
Though note that the caveats from git-cat-file(1) apply here.
We "blame" base objects more than their deltas, even though the relationship could easily be flipped.
Still, it can be a useful rough measure.But one problem is that it's slow to run.
Teaching rev-list to sum up the sizes can be much faster for two reasons:
- It skips all of the piping of object names and sizes.
- If bitmaps are in use, for objects that are in the bitmapped packfile we can skip the
oid_object_info()
lookup entirely, and just ask the revindex for the on-disk size.This patch implements a
--disk-usage
option which produces the same answer in a fraction of the time.
Here are some timings using a clone of torvalds/linux:[rev-list piped to cat-file, no bitmaps] $ time git rev-list --objects --no-object-names --all | git cat-file --buffer --batch-check='%(objectsize:disk)' | perl -lne '$total += $_; END { print $total }' 1459938510 real 0m29.635s user 0m38.003s sys 0m1.093s [internal, no bitmaps] $ time git rev-list --disk-usage --objects --all 1459938510 real 0m31.262s user 0m30.885s sys 0m0.376s
Even though the wall-clock time is slightly worse due to parallelism, notice the CPU savings between the two.
We saved 21% of the CPU just by avoiding the pipes.But the real win is with bitmaps.
If we use them without the new option:[rev-list piped to cat-file, bitmaps] $ time git rev-list --objects --no-object-names --all --use-bitmap-index | git cat-file --batch-check='%(objectsize:disk)' | perl -lne '$total += $_; END { print $total }' 1459938510 real 0m6.244s user 0m8.452s sys 0m0.311s
then we're faster to generate the list of objects, but we still spend a lot of time piping and looking things up.
But if we do both together:[internal, bitmaps] $ time git rev-list --disk-usage --objects --all --use-bitmap-index 1459938510 real 0m0.219s user 0m0.169s sys 0m0.049s
then we get the same answer much faster.
For "--all", that answer will correspond closely to "du objects/pack", of course.
But we're actually checking reachability here, so we're still fast when we ask for more interesting things:$ time git rev-list --disk-usage --use-bitmap-index v5.0..v5.10 374798628 real 0m0.429s user 0m0.356s sys 0m0.072s
rev-list-options
now includes in its man page:
--disk-usage
Suppress normal output; instead, print the sum of the bytes used for on-disk storage by the selected commits or objects. This is equivalent to piping the output into
git cat-file --batch-check='%(objectsize:disk)'
, except that it runs much faster (especially with--use-bitmap-index
). See theCAVEATS
section ingit cat-file
for the limitations of what "on-disk storage" means.
With Git 2.38 (Q3 2022), "git rev-list --disk-usage
"(man) learned to take an optional value human
to show the reported value in human-readable format, like "3.40MiB
".
See commit 9096451 (11 Aug 2022) by Li Linchao (Cactusinhand
).
(Merged by Junio C Hamano -- gitster
-- in commit fddd8b4, 18 Aug 2022)
rev-list
: support human-readable output for--disk-usage
Signed-off-by: Li Linchao
The '
--disk-usage
' option forgit-rev-list
(man) was introduced in 16950f8 ("rev-list
:add
(man)--disk-usage
option for calculating disk usage", 2021-02-09, Git v2.31.0-rc0 -- merge).This is very useful for people inspect their git repository objects usage information, but the resulting number is quit hard for a human to read.
Teach
git rev-list
to output a human readable result when using '--disk-usage=human'.
rev-list-options
now includes in its man page:
With the optional value
human
, on-disk storage size is shown in human-readable string (e.g.12.24 Kib
,3.50 Mib
).
Here's something really ugly:
$ git rev-list HEAD | # list commits
xargs -n1 git ls-tree -rl | # expand their trees
sed -e 's/[^ ]* [^ ]* \(.*\)\t.*/\1/' | # keep only sha-1 and size
sort -u | # eliminate duplicates
awk '{ sum += $2 } END { print sum }' # add up the sizes in bytes
This will only count the blobs (not commits, trees, other), and will not account for either packing or cross-branch object sharing. But it could serve as the basis for something useful.
Paste-able version:
git rev-list HEAD | xargs -n1 git ls-tree -rl | sed -e 's/[^ ]* [^ ]* \(.*\)\t.*/\1/' | sort -u | awk '{ sum += $2 } END { print sum }'
This question doesn't really make sense -- in git, branches are not stored separately. Instead, there is a web of commits, and basically just the diffs are stored. The branches are just pointers to specific commits in this web of commits. So in general branches share a lot of the same information.
If you want to know the size in disk-space of a single branch, meaning, the minimal amount of disk space someone will need if they clone the repo taking only that branch, the simplest thing is probably to make a repo just like that, and then ask for the size of that repo.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With