Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does GCC know what line an error is on when the compiler takes all whitespace and comments out of the code?

I'm sure this applies to other compilers as well, but I've only used GCC. If the compiler optimizes the code by removing everything extraneous that isn't code (comments, whitespace, etc.), how does it correctly show what line an error is on in the original file? Does it only optimize the code after checking for errors? Or does it somehow insert tags so that if an error is found it knows what line it's on?

mycode.cpp: In function ‘foo(int bar)’:
mycode.cpp:59: error: no matching function for call to ‘bla(int bar)’
like image 641
Nick Sweeting Avatar asked Oct 09 '13 13:10

Nick Sweeting


2 Answers

The compiler converts source code to an object format, or more correctly, here, an intermediate format which will later be used to generate object format. I've not looked into the internals of g++, but typically, a compiler will tokenize the input and build a tree structure. When doing so, it will annotate the nodes of the tree with the position in the file where it read the token which the node represents. Many errors are detected during this very parsing, but for those that aren't, the compiler will use the information on the annotated node in the error message.

With regards to "removing everything extraneouss that isn't code", that's true in the sense that the compiler tokenizes the input, and converts it into the tree. But when doing so, it is reading the files; at every point, it is either reading the file, or accessing a node which was annotated while the file was being read.

like image 169
James Kanze Avatar answered Nov 15 '22 03:11

James Kanze


The preprocessor (conceptually) adds #line directives, to tell the compiler which source file and line number correspond to each line of preprocessed source. They look like

// set the current line number to 100, in the current source file
#line 100

// set the current line number to 1, in a header file
#line 1 "header.h"

(Of course, a modern preprocessor usually isn't a separate program, and usually doesn't generated an intermediate text representation, so these are actually some kind of metadata passed to the compiler along with the stream of preprocessed tokens; but it may be simpler, and not significantly incorrect, to think in terms of preprocessed source).

You can add these yourself if you want. Possible uses are testing macros that use the __FILE__ and __LINE__ definitions, and laying traps for maintenance programmers.

like image 28
Mike Seymour Avatar answered Nov 15 '22 04:11

Mike Seymour