Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I run a code formatter over my source without modifying git history?

Tags:

I am trying to format an entire repo using a code formatter tool. In doing so, I want to keep information about who committed which line, so that commands like git blame still show the correct information. By this, I mean it should show the author that previously edited each line (before it was formatted).

There is the git filter-branch command which allows you to run a command against each revision of the repo starting from the beginning of time.

git filter-branch --tree-filter '\   npx prettier --write "src/main/web/app/**/**.{js, jsx}" || \   echo "Error: no JS files found or invalid syntax"' \   -- --all 

It will take forever to run this and really I don't care about the past. I just want to format the master branch going forward without changing ownership of each line. How can I do this? I tried playing with the rev-list at the end and other filter types but it still doesn't work. There must be a way to format the codebase while preserving the author information for each line.

like image 366
aherriot Avatar asked Nov 27 '18 15:11

aherriot


People also ask

What is the safest command to use to change history in github?

There are many ways to rewrite history with git. Use git commit --amend to change your latest log message. Use git commit --amend to make modifications to the most recent commit. Use git rebase to combine commits and modify history of a branch.

What is source code formatter?

A source code formatter accepts a program source file, and generates another equivalent source file which is nicely formatted according to the source language syntax, including indentation, normalized case for identifiers, etc. Example Formatted C text.


2 Answers

What you are trying to do is impossible. You cannot, at some point in time, change a line of code, and yet have git report that the most recent change to that line of code is something that happened before that point in time.

I suppose a source control tool could support the idea of an "unimportant change", where you mark a commit as cosmetic and then history analysis would skip over that commit. I'm not sure how the tool would verify that the change really was cosmetic, and without some form of tool enforcement the feature would assuredly be misused resulting in bug introductions potentially being hidden in "unimportant" commits. But really the reasons I think it's a bad idea are academic here - the bottom line is, git doesn't have such a feature. (Nor can I think of any source control tool that does.)

You can change the formatting going forward. You can preserve the visibility of past changes. You can avoid editing history. But you cannot do all three at the same time, so you're going to have to decide which one to sacrifice.

There are actually a couple down-sides to the history rewrite, by the way. You mentioned processing time, so let's look at that first:

As you've noted, the straightforward way to do this with filter-branch would be very time consuming. There are things you can do to speed it up (like giving it a ramdisk for its working tree), but it's a tree-filter and it involves processing of each version of each file.

If you did some pre-processing, you could be somewhat more efficient. For example, you might be able to preprocess every BLOB in the database and create a mapping (where a TREE contains BLOB X, replace it with BLOB Y), and then use an index-filter to perform the substitutions. This would avoid all the checkout and add operations, and it would avoid repeatedly re-formatting the same code files. So that saves a lot of I/O. But it's a non-trivial thing to set up, and still might be time consuming.

(It's possible to write a more specialized tool based on this same principle, but AFAIK nobody has written one. There is precedent that more specialized tools can be faster than filter-branch...)

Even if you come to a solution that will run fast enough, bear in mind that the history rewrite will disturb all of your refs. Like any history rewrite, it will be necessary for all users of the repo to update their clones - and for something this sweeping, the way I recommend to do that is, throw the clones out before you start the rewrite and re-clone afterward.

That also means if you have anything that depends on commit ID's, that will also be broken. (That could include build infrastructure, or release documentation, etc.; depending on your project's practices.)

So, a history rewrite is a pretty drastic solution. And on the other hand, it also seems drastic to suppose that formatting the code is impossible simply because it wasn't done from day 1. So my advice:

Do the reformatting in a new commit. If you need to use git blame, and it points you to the commit where reformatting occurred, then follow up by running git blame again on the reformat commit's parent.

Yeah, it sucks. For a while. But a given piece of history tends to become less important as it ages, so from there you just let the problem gradually diminish into the past.

like image 135
Mark Adelsberger Avatar answered Sep 18 '22 14:09

Mark Adelsberger


You can make git blame ignoring certain commits, which do only mass reformatting etc.:

Create a file .git-blame-ignore-revs like:

 # Format commit 1 SHA:  1234af5.....  # Format commit 2 SHA:  2e4ac56..... 

Then do

git config blame.ignoreRevsFile .git-blame-ignore-revs 

, so that you don't have to use the --ignore-revs-file option every time with git blame.

Upvote https://github.com/github/feedback/discussions/5033 to get that feature into github's web blame viewer.

like image 34
kxr Avatar answered Sep 19 '22 14:09

kxr