Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Git - get all commits and blobs they created

Tags:

git

commit

blob

Is there a git command that can output for every commit:

  1. id
  2. subject
  3. blobs it created with they path and size (like git ls-tree -l -r <commit> but only for created blobs)
like image 493
tig Avatar asked Aug 22 '09 02:08

tig


5 Answers

To get commits (all and output one line per commit):

git rev-list --all --pretty=oneline

Then split commits by space with limit of 2 and get every commit id and message

To get blobs created by commit (recurse to subdirs, show merge commits, detect renames and copies, don't show commit id on first line):

git diff-tree -r -c -M -C --no-commit-id <commit-sha>

A bit of parsing of every line and excluding some of them — and we get list of new blobs and they path for commit

Last is to get blob sizes:

git cat-file --batch-check < <list-of-blob-shas>

And another time a bit of parsing

like image 100
tig Avatar answered Sep 29 '22 21:09

tig


Relying on git rev-list is not always enough because it

List[s] commits that are reachable by following the parent links from the given commit(s) [..]

(git help rev-list)

Thus it does not list commits that are on another branch and it does not list commits that are not reachable by any branch (perhaps they were created because of some rebase and/or detached-head actions).

Similarly, git log just follows the parent links from the current checked out commit. Again you don't see commits referenced by other branches or which are in a dangling state.

You can really get all commits with a command like this:

for i in `(find .git/objects  -type f |
             sed 's@^.*objects/\(..\)/\(.\+\)$@\1\2@' ;
           git verify-pack -v .git/objects/pack/*.idx  |
             grep commit |
             cut -f1 -d' '; ) | sort -u`
  do
  git log -1 --pretty=format:'%H %P %ai %s%n'  $i
done

To keep it simple, the loop body prints for each commit one line containing its hash, the parent hash(es), date and subject. Note, to iterate over all commits you need to consider packed and not-yet packed objects.

You can print the referenced blobs (and only created ones) by calling git diff-tree $i (and greping for capitial A in the fifth column) from the loop body.

like image 31
maxschlepzig Avatar answered Sep 29 '22 22:09

maxschlepzig


You can get everything but size out of the box. This one is pretty close:

git log --name-status
like image 41
Dustin Avatar answered Sep 29 '22 20:09

Dustin


One solution based on tig's answer:

#!/usr/bin/perl

foreach my $rev (`git rev-list --all --pretty=oneline`) {
  my $tot = 0;
  ($sha = $rev) =~ s/\s.*$//;
  foreach my $blob (`git diff-tree -r -c -M -C --no-commit-id $sha`) {
    $blob = (split /\s/, $blob)[3];
    next if $blob == "0000000000000000000000000000000000000000"; # Deleted
    my $size = `echo $blob | git cat-file --batch-check`;
    $size = (split /\s/, $size)[2];
    $tot += int($size);
  }
  print "$tot $rev" if $tot > 1000000; # Show only if > 1MiB
}

Maybe not the best code, but should get you most of the way.

like image 42
Gavin Brock Avatar answered Sep 29 '22 20:09

Gavin Brock


Another useful command when searching for

git fsck --lost-found

will show dangling commits. I needed to use this to find a commit a i wiped with an ill-timed reset --hard

But don't take my word for it:

https://www.kernel.org/pub/software/scm/git/docs/git-fsck.html

like image 24
starsinmypockets Avatar answered Sep 29 '22 20:09

starsinmypockets