Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Better, simpler example of 'semantic conflict'?

I like to distinguish three different types of conflict from a version control system (VCS):

  • textual
  • syntactic
  • semantic

A textual conflict is one that is detected by the merge or update process. This is flagged by the system. A commit of the result is not permitted by the VCS until the conflict is resolved.

A syntactic conflict is not flagged by the VCS, but the result will not compile. Therefore this should also be picked up by even a slightly careful programmer. (A simple example might be a variable rename by Left and some added lines using that variable by Right. The merge will probably have an unresolved symbol. Alternatively, this might introduce a semantic conflict by variable hiding.)

Finally, a semantic conflict is not flagged by the VCS, the result compiles, but the code may have problems running. In mild cases, incorrect results are produced. In severe cases, a crash could be introduced. Even these should be detected before commit by a very careful programmer, through either code review or unit testing.

My example of a semantic conflict uses SVN (Subversion) and C++, but those choices are not really relevant to the essence of the question.

The base code is:

int i = 0;
int odds = 0;
while (i < 10)
{
    if ((i & 1) != 0)
    {
        odds *= 10;
        odds += i;
    }
    // next
    ++ i;
}
assert (odds == 13579)

The Left (L) and Right (R) changes are as follows.

Left's 'optimisation' (changing the values the loop variable takes):

int i = 1; // L
int odds = 0;
while (i < 10)
{
    if ((i & 1) != 0)
    {
        odds *= 10;
        odds += i;
    }
    // next
    i += 2; // L
}
assert (odds == 13579)

Right's 'optimisation' (changing how the loop variable is used):

int i = 0;
int odds = 0;
while (i < 5) // R
{
    odds *= 10;
    odds += 2 * i + 1; // R
    // next
    ++ i;
}
assert (odds == 13579)

This is the result of a merge or update, and is not detected by SVN (which is correct behaviour for the VCS), so it is not a textual conflict. Note that it compiles, so it is not a syntactic conflict.

int i = 1; // L
int odds = 0;
while (i < 5) // R
{
    odds *= 10;
    odds += 2 * i + 1; // R
    // next
    i += 2; // L
}
assert (odds == 13579)

The assert fails because odds is 37.

So my question is as follows. Is there a simpler example than this? Is there a simple example where the compiled executable has a new crash?

As a secondary question, are there cases of this that you have encountered in real code? Again, simple examples are especially welcome.

like image 333
Rhubbarb Avatar asked Mar 25 '10 09:03

Rhubbarb


1 Answers

It is not obvious to come up with simple relevant examples, and this comment sum up best why:

If the changes are close by, then trivial resolutions are more likely to be correct (because those that are incorrect are more likely to touch the same parts of the code and thus result in non-trivial conflicts), and in those few cases where they aren’t, the problem will manifest itself relatively quickly and probably in an obvious way.

[Which is basically what your example illustrates]

But detecting semantic conflicts introduced by merges between changes in widely separated areas of the code is likely to require holding more of the program in your head than most programmers can – or in projects the size of the kernel, than any programmer can.
So even if you did review those 3-way diffs manually, it would be a comparatively useless exercise: the effort would be far disproportionate with the gain in confidence.

In fact, I would argue that merging is a red herring:
this sort of semantic clash between disparate but interdependent parts of the code is inevitable the moment they can evolve separately.
How this concurrent development process is organized – DVCS; CVCS; tarballs and patches; everyone edits the same files on a network share – is of no consequence at all to that fact.
Merging doesn’t cause semantic clashes, programming causes semantic clashes.

In other words, the real case of semantic conflicts I have encountered in real code after a merge were not simple, but rather quite complex.


That being said, the simplest example, as illustrated by Martin Fowler in his article Feature Branch is a method rename:

The problem I worry more about is a semantic conflict.
A simple example of this is that if Professor Plum changes the name of a method that Reverend Green's code calls. Refactoring tools allow you to rename a method safely, but only on your code base.
So if G1-6 contain new code that calls foo, Professor Plum can't tell in his code base as he doesn't have it. You only find out on the big merge.

A function rename is a relatively obvious case of a semantic conflict.
In practice they can be much more subtle.

Tests are the key to discovering them, but the more code there is to merge the more likely you'll have conflicts and the harder it is to fix them.
It's the risk of conflicts, particularly semantic conflicts, that make big merges scary.


As Ole Lynge mentions in his answer (upvoted), Martin Fowler did write today (time of this edit) an post about "semantic conflict", including the following illustration:

semantic conflict illustration

Again, this is based on function renaming, even though subtler case based on internal function refactoring are mentioned:

The simplest example is that of renaming a function.
Say I think that the method clcBl would be easier to work with if it were called calculateBill.

So the first point here is that however powerful your tooling is, it will only protect you from textual conflicts.

There are, however, a couple of strategies that can significantly help us deal with them

  • The first of these is SelfTestingCode. Tests are effectively probing our code to see if their view of the code's semantics are consistent with what the code actually does
  • The other technique that helps is to merge more often

Often people try to justify DVCSs based on how they make feature branching easy. But that misses the issues of semantic conflicts.
If your features are built quickly, within a couple of days, then you'll run into less semantic conflicts (and if less than a day, then it's in effect the same as CI). However we don't see such short feature branches very often.

I think a middle ground needs to be found between shot-lived branches and feature-branches.
And merging often is key if you have a group of developer on the same feature branch.

like image 92
VonC Avatar answered Oct 19 '22 02:10

VonC