Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are the 'dereference' and the 'address of' operators on the left?

In C (and some other C-like languages) we have 2 unary operators for working with pointers: the dereference operator (*) and the 'address of' operator (&). They are left unary operators, which introduces an uncertainty in order of operations, for example:

*ptr->field

or

*arr[id]

The order of operations is strictly defined by the standard, but from a human perspective, it is confusing. If the * operator was a right unary operator, the order would be obvious and wouldn't require extra parentheses:

ptr*->field vs ptr->field*

and

arr*[id] vs arr[id]*

So is there a good reason why are the operators left unary, instead of right. One thing that comes to mind would be the declaration of types. Left operators stay near the type name (char *a vs char a*), but there are type declarations, which already break this rule, so why bother (char a[num], char (*a)(char), etc).

Obviously, there are some problems with this approach too, like the

 val*=2

Which would be either an *= short hand for val = val * 2 or dereference and assign val* = 2. However this can be easily solved by requiring a white space between the * and = tokens in case of dereferencing. Once again, nothing groundbreaking, since there is a precedent of such a rule (- -a vs --a).

So why are they left instead of right operators?

Edit: I want to point out, that I asked this question, because many of the weirder aspects of C have interesting explanations, for why they are the way they are, like the existence of the -> operator or the type declarations or the indexing starting from 0. And so on. The reasons may be no longer valid, but they are still interesting in my opinion.

like image 472
RuRo Avatar asked Oct 15 '17 20:10

RuRo


1 Answers

There indeed is an authoritative source: "The Development of the C Language" by the creator of the language, Dennis M. Ritchie:

An accident of syntax contributed to the perceived complexity of the language. The indirection operator, spelled * in C, is syntactically a unary prefix operator, just as in BCPL and B. This works well in simple expressions, but in more complex cases, parentheses are required to direct the parsing. For example, to distinguish indirection through the value returned by a function from calling a function designated by a pointer, one writes *fp() and (*pf)() respectively. The style used in expressions carries through to declarations, so the names might be declared

int *fp();
int (*pf)();

In more ornate but still realistic cases, things become worse:

int *(*pfp)();

is a pointer to a function returning a pointer to an integer. There are two effects occurring. Most important, C has a relatively rich set of ways of describing types (compared, say, with Pascal). Declarations in languages as expressive as C—Algol 68, for example—describe objects equally hard to understand, simply because the objects themselves are complex. A second effect owes to details of the syntax. Declarations in C must be read in an `inside-out' style that many find difficult to grasp [Anderson 80]. Sethi [Sethi 81] observed that many of the nested declarations and expressions would become simpler if the indirection operator had been taken as a postfix operator instead of prefix, but by then it was too late to change.


Thus the reason why * is on the left in C is because it was on the left in B.

B was partially based on BCPL, where the dereferencing operator was !. This was on the left; the binary ! was an array indexing operator:

a!b

is equivalent to !(a+b).

!a

is the content of the cell whose address is given by a; it can appear on the left of an assignment.

Yet the 50 year old BCPL manual doesn't even contain mentions of the ! operator - instead, the operators were words: unary lv and rv. Since these were understood as if they were functions, it was natural that they preceded the operand; later the longish rv a could then be replaced with syntactic sugar !a.


Many of the current C operator practices can be traced via this route. B alike had a[b] being equivalent to *(a + b) to *(b + a) to b[a] just like in BCPL one could use a!b <=> b!a.

Notice that in B variables were untyped, so certainly similarity with declarations could not have been the reason to use * on the left there.

So the reason for unary * being on the left in C is as boring as "there wasn't any problem in the simpler programs with the unary * being on the left, in the position that everyone was accustomed to have the dereferencing operator in other languages, that no one really thought that some other way would have been better until it was too late to change it".

like image 176