In this thread the top rated answer received a lot of up votes and even a bounty. It proposes the following algorithm:
void RemoveSpaces(char* source)
{
char* i = source;
char* j = source;
while(*j != 0)
{
*i = *j++; // UB?
if(*i != ' ')
i++;
}
*i = 0;
}
My knee jerk reaction was that this code invokes undefined behavior, because i
and j
point at the same memory location, and an expression such as *i = *j++;
would then access the same variable twice, for other purposes than to determine what to store, with no sequence point in between. Even though they are two different variables, they initially point at the same memory location.
However I am not certain, as I don't quite see how the two non-sequenced accesses of the same memory location could cause any harm in practice.
Am I correct in stating that this is undefined behavior? And if so, are there any examples of how relying on such UB could cause harmful behavior?
EDIT
Relevant part of the C standard which would label this as UB is:
C99 6.5
Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.
C11 6.5
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. If there are multiple allowable orderings of the subexpressions of an expression, the behavior is undefined if such an unsequenced side effect occurs in any of the orderings.
The actual meaning of the text should be the same in both versions of the standard, but I believe the C99 text is far easier to read and understand.
There are two situations where accessing the same object twice without an intervening sequence point is undefined behaviour:
If the modify the same object twice. For example
int x = (*p = 1, 1) + (*p = 2, 100);
Obviously you wouldn't know whether *p is 1 or 2 after this, but the wording in the C standard says that it is undefined behaviour, even if you write
int x = (*p = 1, 1) + (*p = 1, 100);
so storing the same value twice doesn't save you.
If you modify the object, but also read it without using the value read to determine the new value of the object. That means
*p = *p + 1;
is fine, because you read *p
, you modify *p
, but you read *p
in order to determine the value stored into *
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With