This question popped into my head today at work when I was having yet another domestic affair with my compiler. Despite my buff pinky (due to all the semicolon pressing I do at work), I managed to miss one before an if
statement. Obviously, this resulted in a compile error:
error C2143: syntax error : missing ';' before 'if'
So I wondered "well gee, why can't you tell me the line that's missing the semicolon instead of the line after the problem." and I proceeded to experiment with other similar syntax errors:
error C2065: 'myUndeclared' : undeclared identifier
error C2143: syntax error : missing ')' before 'if'
etc...
Now, all of those errors would, similarly, take me to the line after the problem and complain about something before the if
statement.
Consider the following:
SomeFunction(x) //Notice, there is no ';' here
if(bSomeCondition)
{
...
}
I get two compile errors:
(Line 265) error C2065: 'x' : undeclared identifier
(Line 266) error C2143: syntax error : missing ';' before 'if'
However, the first error correctly tells me the line number, despite the missing semicolon. This suggests to me that the compiler doesn't get tripped up in parsing and is able to make it past the semicolon problem. So, why is it that the compiler insists on grammatical errors being reported in this way? Other errors (non grammatical) are reported on the lines they are found. Does this have to do with the compiler making multiple passes? Basically, I hope someone with a working knowledge of the C++ compiler might explain specifically what the compiler is doing that necessitates the reporting of errors in this "before" way.
The short answer to the more general question of "Why do C/C++ error messages suck" is "Sometimes C++ is really hard to parse" (it doesn't actually have a context free grammar). However, this isn't really a valid reason - one can still make tools that record better diagnostic information than most C++ compilers.
The more practical answer is "Compiler authors have inherited legacy codebases which didn't value error messages", combined with a mild dose of "compiler authors are lazy", topped with "Diagnostic reporting isn't an exciting problem". Most compiler writers would add a new language feature or 3% codegen performance improvement, rather than do significant refactoring on the codebase to allow decent error reporting. The specific question about "Why aren't errors properly localised to the line that 'caused' them" is an instance of this. There's not really a technical reason compilers can't generally work out that a ;
is missing
, and then tell you about the source span of the last ;
lacking statement - even in the presence of C++'s general whitespace invariance. It's just that storing that information has (largely) been historically ignored.
That said, new compilers not hampered by decades of old code are doing much better. Have a look at the Clang compiler, which prides itself on sensible error messages. The page on diagnostics shows how much better than GCC they are. An example for this case being:
$ gcc-4.2 t.c
t.c: In function 'foo':
t.c:5: error: expected ';' before '}' token
$ clang t.c
t.c:4:8: error: expected ';' after expression
bar()
^
;
Or, more impressively:
$ cat t.cc
template<class T>
class a {}
class temp {};
a<temp> b;
struct b {
}
$ gcc-4.2 t.cc
t.cc:3: error: multiple types in one declaration
t.cc:4: error: non-template type 'a' used as a template
t.cc:4: error: invalid type in declaration before ';' token
t.cc:6: error: expected unqualified-id at end of input
$ clang t.cc
t.cc:2:11: error: expected ';' after class
class a {}
^
;
t.cc:6:2: error: expected ';' after struct
}
^
;
Look, it's even telling us what to type where to fix the problem! </clang_salespitch>
Because in C++, white-space doesn't matter, on the whole. So this is valid code:
SomeFunction(x)
;if(bSomeCondition)
{
...
}
So the compiler message is simply reporting that a semi-colon hasn't appeared somewhere before the if
.
In this code:
SomeFunction(x)
if (y) {
}
As you said, the error would be reported on line 2 as missing ';' before 'if'
.
There is not wrong with line 1. It's perfectly valid without a semi-colon, and several expressions are possible besides just a semi-colon (such as a dot, or a math operator, or assignment, or a pointer, etc).
So, reporting the error on the previous line may not always make sense, take this example:
SomeFunction(x)
+= 10
- 5
// blank line
// blank line
if (y) {
}
Which line has the error? The line with the - 5
? Or one of the comment lines? To the compiler, the error is actually with the 'if', since it is the first place that something can be detected as being wrong. To report a different line, the compiler would have to report the last properly parsed token as the error, rather than the first place the error is detected. That sounds a little backwards, and saying that //blank line1
is missing a semi-colon is even more confusing, since changing it to //blank line;
would of course not change or fix the error.
By the way, this is not unique to C or C++. This is a common way to report errors in most parsers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With