So we have this hundreds of thousands of lines of code git repository and since I joined the project 2 years ago, the formatting bugs me. And it not only bugs me but as devs randomly "fix" the fomratting, merges result in headache when the code-formatting was applied on one side only. Now reformat code is a two minutes task but results in merge conflict hell, too. I recently merged master to a long-living feature branch and tried: <ul> <li>format code in master, merge to feature branch: 3-way merge tool meld gives me exactly the mess I mentioned above. Doesn't detect function boundaries. Really no fun to merge.</li> <li>format code in master, format code in feature branch, merge master: Now I still get 30 files with conflicts that are much easier to sort out</li> </ul> Now I wonder if it's worth merging, as there are another 15 branches that will all need the exact same code reviews and as manual merging is error-prone I wonder if there is some way of doing this without getting these merge conflicts.

<h3>Recipe with assumptions</h3> (note: I have not tested any of this) We'll assume the reformatter is in <code>~/Downloads/android-studio/bin/format.sh</code> and [note: apparently this is a bad assumption!] that it reads stdin and writes stdout, and works on one file at a time. (It's possible, but very difficult, to make this work with something that needs more than one file at a time. You cannot use this recipe for this case, though. Git's basic filtering mechanism requires that each filter simply read stdin and write stdout. By default Git assumes the filter works, even if it exits with a failure status.) Choose where to run the filter as well; here I've set it up as the "clean" filter only. In <code>~/.gitconfig</code> or <code>.git/config</code>, add the definition for the filter: <pre class="prettyprint"><code>[filter "my-xyz-language-formatter"] clean = ~/Downloads/android-studio/bin/format.sh smudge = cat </code></pre> (this assumes that running <code>cat</code> runs a filter that writes, to its stdout, its unchanged input; this is true on any Unix-like system). Then, create a <code>.gitattributes</code> file if needed. It will apply to the directory you create it in, and all sub-directories, unless overridden in those sub-directories, so place it in the highest sensible location, usually the root of the repository, but sometimes underneath a <code>source/</code> or <code>src/</code> or whatever directory. Add line(s) to direct file(s) matching some pattern(s) through your formatter. We'll assume here that all files named <code>*.xyz</code> should be formatted: <pre class="prettyprint"><code>*.xyz filter=my-xyz-language-formatter </code></pre> This filter will now apply to all extractions and insertions of <code>*.xyz</code> files. The gitattributes documentation talks about these being applied at check-out and check-in time, but that's not quite precisely correct. Instead, a clean filter is applied whenever Git copies from work-tree to index (essentially, <code>git add</code>—well before <code>git commit</code> unless you use <code>git commit -a</code> or similar flags). A smudge filter is applied whenever Git copies from index to work-tree (essentially, <code>git checkout</code>, but also some additional cases, such as <code>git reset --hard</code>). Note that spinning up one filter for each file can be quite slow. There's a "long running filter process" protocol you can use if you have a lot of control over the filter, which can speed this up (especially on Windows). That's beyond the scope of this answer, though. Running <code>git merge</code> normally does not use the filters (it works on the copies that are already in the index, which is outside the filtering step). However, adding <code>-X renormalize</code> to a standard merge will make <code>git merge</code> do the "virtual check-in and check-out" described below, so that it will apply the filters. This happens for all three commits involved in the merge (and in both directions—clean and smudge—so it's roughly 6x slower than for just one commit). <h3>Description (see below)</h3> Git itself is only partially helpful here. Fundamentally, the problem is that Git is stupid and line-oriented: it runs <code>git diff</code> from the merge base commit to each tip commit. If one or both of these <code>git diff</code>s sees a lot of formatting changes, it considers those significant and worthy of applying to the base. It has no semantic knowledge of the input code. (Since you can take over the entire merge process, you could write a smarter merge that does use semantic analysis. This is pretty difficult, though. The only system I know of that does this, or something approaching this, is Ira Baxter's commercial software, and I've never actually used that; I just understand the theory behind it.) There is a solution that does not depend on making Git smarter. If you have a semantic analyzer that outputs consistently formatted code, regardless of the input form, you can feed all three versions—B for base, L for left or local or <code>--ours</code>, and R for right or remote or other or <code>--theirs</code>—into this formatter: <pre class="prettyprint"><code>reformat B.formatted reformat < L > L.formatted reformat < R > R.formatted </code></pre> Now you can have Git merge all three formatted versions, rather than merging the original possibly-not-yet-formatted (but maybe formatted) versions. The result of this merge will, of course, be re-formatted. But presumably this is what you'd like anyway. The way to achieve this with Git's built-in tools is to use what it calls smudge and clean filters. A smudge filter is applied to files as they are extracted from the repository into the work-tree. A clean filter is applied to files whenever they go from the work-tree into the repository. In this case, the smudge filter can be "do nothing to the data", preserving exactly what was committed. The clean filter can be the reformatter. Or, if you prefer, the smudge filter can be the reformatter, and the clean filter can be the reformatter again, or a no-op filter. Once you have this in place—this is something you set up in <code>.gitattributes</code>, by defining a filter for particular files by path names, and the filter-driver in <code>.git/config</code> or your main (user or system wide) <code>.gitconfig</code>. Once you have all that set up, you can run <code>git merge -X renormalize</code>. Git will extract the B, L, and R versions as usual, but then run them through a "virtual check-out and check-in" step, making three temporary commits,1B.formatted and so on. It then does the merge using the three temporary commits, rather than from the original three commits. The hard part is finding a reformatter that does just what you want / need. Some modern systems have them, e.g., <code>gofmt</code> or <code>clang-format</code>. If there's one that does what you need, it just becomes a matter of plugging all this together—and getting buy-in from the rest of your group, that this reformatting is a good idea. <hr> 1Technically it just makes tree objects; there's no need for actual commits.

How can I format the code in a multi-branch project?

Tags:

git

code-formatting

branch

merge

So we have this hundreds of thousands of lines of code git repository and since I joined the project 2 years ago, the formatting bugs me. And it not only bugs me but as devs randomly "fix" the fomratting, merges result in headache when the code-formatting was applied on one side only. Now reformat code is a two minutes task but results in merge conflict hell, too. I recently merged master to a long-living feature branch and tried:

format code in master, merge to feature branch: 3-way merge tool meld gives me exactly the mess I mentioned above. Doesn't detect function boundaries. Really no fun to merge.
format code in master, format code in feature branch, merge master: Now I still get 30 files with conflicts that are much easier to sort out

Now I wonder if it's worth merging, as there are another 15 branches that will all need the exact same code reviews and as manual merging is error-prone I wonder if there is some way of doing this without getting these merge conflicts.

382

asked Oct 30 '17 19:10

Giszmo

2 Answers

Recipe with assumptions

(note: I have not tested any of this)

We'll assume the reformatter is in ~/Downloads/android-studio/bin/format.sh and [note: apparently this is a bad assumption!] that it reads stdin and writes stdout, and works on one file at a time. (It's possible, but very difficult, to make this work with something that needs more than one file at a time. You cannot use this recipe for this case, though. Git's basic filtering mechanism requires that each filter simply read stdin and write stdout. By default Git assumes the filter works, even if it exits with a failure status.)

Choose where to run the filter as well; here I've set it up as the "clean" filter only.

In ~/.gitconfig or .git/config, add the definition for the filter:

[filter "my-xyz-language-formatter"]
    clean = ~/Downloads/android-studio/bin/format.sh
    smudge = cat

(this assumes that running cat runs a filter that writes, to its stdout, its unchanged input; this is true on any Unix-like system).

Then, create a .gitattributes file if needed. It will apply to the directory you create it in, and all sub-directories, unless overridden in those sub-directories, so place it in the highest sensible location, usually the root of the repository, but sometimes underneath a source/ or src/ or whatever directory. Add line(s) to direct file(s) matching some pattern(s) through your formatter. We'll assume here that all files named *.xyz should be formatted:

*.xyz   filter=my-xyz-language-formatter

This filter will now apply to all extractions and insertions of *.xyz files. The gitattributes documentation talks about these being applied at check-out and check-in time, but that's not quite precisely correct. Instead, a clean filter is applied whenever Git copies from work-tree to index (essentially, git add—well before git commit unless you use git commit -a or similar flags). A smudge filter is applied whenever Git copies from index to work-tree (essentially, git checkout, but also some additional cases, such as git reset --hard).

Note that spinning up one filter for each file can be quite slow. There's a "long running filter process" protocol you can use if you have a lot of control over the filter, which can speed this up (especially on Windows). That's beyond the scope of this answer, though.

Running git merge normally does not use the filters (it works on the copies that are already in the index, which is outside the filtering step). However, adding -X renormalize to a standard merge will make git merge do the "virtual check-in and check-out" described below, so that it will apply the filters. This happens for all three commits involved in the merge (and in both directions—clean and smudge—so it's roughly 6x slower than for just one commit).

Description (see below)

Git itself is only partially helpful here.

Fundamentally, the problem is that Git is stupid and line-oriented: it runs git diff from the merge base commit to each tip commit. If one or both of these git diffs sees a lot of formatting changes, it considers those significant and worthy of applying to the base. It has no semantic knowledge of the input code.

(Since you can take over the entire merge process, you could write a smarter merge that does use semantic analysis. This is pretty difficult, though. The only system I know of that does this, or something approaching this, is Ira Baxter's commercial software, and I've never actually used that; I just understand the theory behind it.)

There is a solution that does not depend on making Git smarter. If you have a semantic analyzer that outputs consistently formatted code, regardless of the input form, you can feed all three versions—B for base, L for left or local or --ours, and R for right or remote or other or --theirs—into this formatter:

reformat < B > B.formatted
reformat < L > L.formatted
reformat < R > R.formatted

Now you can have Git merge all three formatted versions, rather than merging the original possibly-not-yet-formatted (but maybe formatted) versions.

The result of this merge will, of course, be re-formatted. But presumably this is what you'd like anyway.

The way to achieve this with Git's built-in tools is to use what it calls smudge and clean filters. A smudge filter is applied to files as they are extracted from the repository into the work-tree. A clean filter is applied to files whenever they go from the work-tree into the repository.

In this case, the smudge filter can be "do nothing to the data", preserving exactly what was committed. The clean filter can be the reformatter. Or, if you prefer, the smudge filter can be the reformatter, and the clean filter can be the reformatter again, or a no-op filter. Once you have this in place—this is something you set up in .gitattributes, by defining a filter for particular files by path names, and the filter-driver in .git/config or your main (user or system wide) .gitconfig.

Once you have all that set up, you can run git merge -X renormalize. Git will extract the B, L, and R versions as usual, but then run them through a "virtual check-out and check-in" step, making three temporary commits,¹B.formatted and so on. It then does the merge using the three temporary commits, rather than from the original three commits.

The hard part is finding a reformatter that does just what you want / need. Some modern systems have them, e.g., gofmt or clang-format. If there's one that does what you need, it just becomes a matter of plugging all this together—and getting buy-in from the rest of your group, that this reformatting is a good idea.

¹Technically it just makes tree objects; there's no need for actual commits.

147

answered Oct 19 '22 10:10

torek

While torek probably got me on a good track, it did not help me to get the reformatting done across branches. The problem was that the filter applied after git had added these

<<<< HEAD
bla foo 123
====
bla 123
>>>> otherBranch

blocks, so the filter would indent the conflict markers ... which is not good.

While this probably has some solution, I went with a custom merge tool:

#!/bin/bash

BASE=$1
LOCAL=$2
REMOTE=$3
MERGED=$4

if echo "$BASE" | grep -q "\.java"; then
    echo "Normalizing java file";
    astyle $BASE
    astyle $LOCAL
    astyle $REMOTE
    astyle $MERGED
fi


meld "$LOCAL" "$BASE" "$REMOTE" --output "$MERGED"

configured in .gitconfig as:

[merge]
    tool = customMergeTool
[mergetool "customMergeTool"]
    cmd = /path/to/customMergeTool.sh \"$BASE\" \"$LOCAL\" \"$REMOTE\" \"$MERGED\"

With my approach, git would still detect conflicts that when handled with my script are without merge conflicts in 40 of my 100 cases, so torek's approach could probably speed things up there but I ran into serious issues merging the other 40 files, so I gave it up for now.

answered Oct 19 '22 10:10

Giszmo

Related questions
                            
                                curl always truncates the username after 63 characters
                            
                                Powershell takes a long time to load on start-up while loading ssh-agent / git
                            
                                Git Extension for Visual Studio
                            
                                How to stop fetching a given branch from a given remote?
                            
                                How to use Hg-to-Git tool - fast-export?
                            
                                Gerrit - Gitlab Integration
                            
                                Error in npm install command
                            
                                Is there any provision for converting .cvsignore files to .gitignore file?
                            
                                Versioning on development and release branches (git-flow)
                            
                                What's the fastest way to get a private Maven Repository up and running?
                            
                                Gitignore a file if file with different extension exists
                            
                                Can I have my Git 'difftool' apply '--dir-diff' by default?
                            
                                git ignore line endings
                            
                                Git pull doesn't seem to work
                            
                                Git credential fill rejects my input, scheduled on Windows
                            
                                Deleted Git tags restore themselves
                            
                                Why does git update-ref accepts non /refs references?
                            
                                Git: Confusion about merge algorithm, conflict format, and interplay with mergetools
                            
                                Publish Jenkins Job build status to Gitlab commit with Jenkins Pipeline job
                            
                                Why does Visual Studio mark my added .cs files as "ignored"?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I format the code in a multi-branch project?

Tags:

git

code-formatting

branch

merge

Giszmo

People also ask

2 Answers

Recipe with assumptions

Description (see below)

torek

Giszmo

Recent Activity

Donate For Us