Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In the comma operator, is the left operand guaranteed not to be actually executed if it hasn't side effects?

Tags:

c++

c

gcc

standards

To show the topic I'm going to use C, but the same macro can be used also in C++ (with or without struct), raising the same question.

I came up with this macro

#define STR_MEMBER(S,X) (((struct S*)NULL)->X, #X) 

Its purpose is to have strings (const char*) of an existing member of a struct, so that if the member doesn't exist, the compilation fails. A minimal usage example:

#include <stdio.h>  struct a {     int value; };  int main(void) {     printf("a.%s member really exists\n", STR_MEMBER(a, value));     return 0; } 

If value weren't a member of struct a, the code wouldn't compile, and this is what I wanted.

The comma operator should evaluate the left operand and then discard the result of the expression (if there is one), so that my understanding is that usually this operator is used when the evaluation of the left operand has side effects.

In this case, however, there aren't (intended) side effects, but of course it works iff the compiler doesn't actually produce the code which evaluates the expression, for otherwise it would access to a struct located at NULL and a segmentation fault would occur.

Gcc/g++ 6.3 and 4.9.2 never produced that dangerous code, even with -O0, as if they were always able to “see” that the evaluation hasn't side effects and so it can be skipped.

Adding volatile in the macro (e.g. because accessing that memory address is the desired side effect) was so far the only way to trigger the segmentation fault.

So the question: is there anything in the C and C++ languages standard which guarantees that compilers will always avoid actual evaluation of the left operand of the comma operator when the compiler can be sure that the evaluation hasn't side effects?

Notes and fixing

I am not asking for a judgment about the macro as it is and the opportunity to use it or make it better. For the purpose of this question, the macro is bad if and only if it evokes undefined behaviour — i.e., if and only if it is risky because compilers are allowed to generate the “evaluation code” even when this hasn't side effects.

I have already two obvious fixes in mind: “reifying” the struct and using offsetof. The former needs an accessible memory area as big as the biggest struct we use as first argument of STR_MEMBER (e.g. maybe a static union could do…). The latter should work flawlessly: it gives an offset we aren't interested in and avoids the access problem — indeed I'm assuming gcc, because it's the compiler I use (hence the tag), and that its offsetof built-in behaves.

With the offsetof fix the macro becomes

#define STR_MEMBER(S,X) (offsetof(struct S,X), #X) 

Writing volatile struct S instead of struct S doesn't cause the segfault.

Suggestions about other possible “fixes” are welcome, too.

Added note

Actually, the real usage case was in C++ in a static storage struct. This seems to be fine in C++, but as soon as I tried C with a code closer to the original instead of the one boiled for this question, I realized that C isn't happy at all with that:

error: initializer element is not constant 

C wants the struct to be initializable at compile time, instead C++ it's fine with that.

like image 624
ShinTakezou Avatar asked Sep 20 '17 20:09

ShinTakezou


People also ask

What does comma operator do?

The comma operator ( , ) evaluates each of its operands (from left to right) and returns the value of the last operand. This lets you create a compound expression in which multiple expressions are evaluated, with the compound expression's final value being the value of the rightmost of its member expressions.

What does comma operator do in C?

The comma operator in c comes with the lowest precedence in the C language. The comma operator is basically a binary operator that initially operates the first available operand, discards the obtained result from it, evaluates the operands present after this, and then returns the result/value accordingly.

How is comma operator useful in a for loop in C?

The comma operator will always yield the last value in the comma separated list. Basically it's a binary operator that evaluates the left hand value but discards it, then evaluates the right hand value and returns it. If you chain multiple of these they will eventually yield the last value in the chain.

Which operator works on list of comma separated values?

It is comma operator.


2 Answers

Is there anything in the C and C++ languages standard which guarantees that compilers will always avoid actual evaluation of the left operand of the comma operator ?

It's the opposite. The standard guarantees that the left operand IS evaluated (really it does, there aren't any exceptions). The result is discarded.


Note: for lvalue expressions, "evaluate" does not mean "access the stored value". Instead, it means to work out where the designated memory location is. The other code encompassing the lvalue expression may or may not then go on to access the memory location. The process of reading from the memory location is known as "lvalue conversion" in C, or "lvalue to rvalue conversion" in C++.

In C++ a discarded-value expression (such as the left operand of the comma operator) only has lvalue to rvalue conversion performed on it if it is volatile and also meets some other criteria (see C++14 [expr]/11 for detail). In C lvalue conversion does occur for expressions whose result is not used (C11 6.3.2.1/2).

In your example, it is moot whether or not lvalue conversion happens. In both languages X->Y, where X is a pointer, is defined as (*X).Y; in C the act of applying * to a null pointer already causes undefined behaviour (C11 6.5.3/3), and in C++ the . operator is only defined for the case when the left operand actually designates an object (C++14 [expr.ref]/4.2).

like image 109
M.M Avatar answered Oct 04 '22 16:10

M.M


The comma operator (C documentation, says something very similar) has no such guarantees.

In a comma expression E1, E2, the expression E1 is evaluated, its result is discarded ..., and its side effects are completed before evaluation of the expression E2 begins

irrelevant information omitted

To put it simply, E1 will be evaluated, although the compiler might optimize it away by the as-if rule if it is able to determine that there are no side-effects.

like image 40
Justin Avatar answered Oct 04 '22 16:10

Justin