Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Git Squash sequential commits by an author to compress history

My team was working on a long running feature branch which has hundreds of commits now and now i need to merge it into master for production release.

I do not want to have that many commits in that branch since many commits were done for doing bug fixes and are changing only couple of lines per commit.

On the PR creation page on Github it limits the commits shown at 250.

From Github - "This comparison is big! We’re only showing the most recent 250 commits"

Hence, I decided to compress the history in a way that a set of sequential commits from an author gets squashed into a single commit.

E.g say we have commits like : A - A - A - B - B - A - C - D - D - B- B -A from authors A, B, C and D then the resulting commit log will have A(3) - B(2) - A - C - D(2) - B(2) - A where X`(N) is squashed commit of N commits from author X.

Edit: I understand that this will need a script and i am looking for the same. I do not want to go through interactive rebase to do the same.

like image 882
Ranjan Avatar asked Nov 10 '22 00:11

Ranjan


1 Answers

There is nothing built in to do this; you would have to write your own script.

To do so, start with git rev-list or git log (both do essentially the same thing, with somewhat different command-line options) to iterate through all the commits you want to scan. Your goal here is to copy-but-squash commits, onto a new temporary branch. For instance, assuming the commits were all made on branch feature and are to be merged into branch target, you can get the list of commits to inspect with:

git rev-list --reverse --topo-order target..feature > /tmp/list

The output here is a list of commit SHA-1 IDs. For each commit you will want to find the author, and probably the commit message:

while read sha1; do
    author_name=$(git log -n 1 --format=%an $sha1)
    ...
done < /tmp/list

If the current author name is the same as the previous author name, you just want to accumulate this commit, but if it's different, you want to emit the commit ID. Since $author_name is initially unset it will be the empty string, so the first commit won't match a previous author, but you must handle this specially (along with the final commit ID) since you always want to accumulate the first commit, and take action after the last commit or upon author-changes. Hence the ... section is a bit complicated. We also need some setup work to create and get onto a temporary branch with its first commit pointing to the tip of branch target. Rather than using a named temporary branch, we'll use an anonymous one here.

Finally, the method for making a squashed commit is particularly tricky: the easy way to do this is to use a plumbing command, git commit-tree, and then to advance the temporary branch.

Putting all of this together, we get the following completely-untested code:

# add new squash-style commit using commit $1
make_squash_commit() {
    local sha1=$1 tree new_sha1

    tree=$(git rev-parse $sha1^{tree})
    new_sha1=$(git commit-tree $tree -p HEAD)
    git update-ref -m "squash $sha1" HEAD $new_sha1
}

set -e
git rev-list --reverse --topo-order target..feature > /tmp/list
git checkout --detach target
: > /tmp/accum_log
prev_sha1=""
while read sha1; do
    author_name=$(git log -n 1 --format=%an $sha1)
    if [ "$author_name" != "$prev_name" -a -n "$prev_sha1" ]; then
        make_squash_commit $prev_sha1 < /tmp/accum_log
        : > /tmp/accum_log
    fi
    prev_name="$author_name"
    prev_sha1=$sha1
    git log -n 1 --format="%B" $sha1 >> /tmp/accum_log
done < /tmp/list
if [ -z "$prev_sha1" ]; then
    echo "Warning: no commits found to squash!"
    sha1=$(git rev-parse target)
fi
# squash final commit, then give anonymous branch a name
make_squash_commit $prev_sha1 < /tmp/accum_log
git checkout -b temp_branch
rm /tmp/list /tmp/accum_log

There's a somewhat deliberate flaw in this script: it makes all the new commits with you as the author and committer of each one, using the current date and time (which is appropriate since you're making a mess of multiple authors' commits, squashing them regardless of whether they agreed to it). You can "fix" this by setting all the appropriate environment variables at the git commit-tree step (see its documentation).

like image 200
torek Avatar answered Nov 15 '22 05:11

torek