Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Testing a compiler

I'm currently working on kind of compiler that was built making use of sablecc.

Long story short, the compiler will take as input both specification files(this is what we're parsing) and .class files and will instrument the .class files bytecode so to make sure that when running the .class files, any of the specifications is not being violated (this is a bit like jml/code contracts! but way more powerful).

We have some dozens of system tests that cover a large part of the analysis phase (related with making sure the specifications make sense, and that they also are in concordance with the .class files they are supposed to specify).

We divided them in two sets: the valid tests and the invalid tests.

  • The valid tests are comprised of source code files that when compiled by our compiler should should pop up no compiler errors / warnings.

  • The invalid tests are comprised of source code files that when compiled by our compiler should should pop up at least one compiler error / warning.

This has served us well while we were in the analysis phase. The question now is on how to test the code generation phase. I've done, in the past, system tests over a little compiler I've developed on a compilers course. Each test would consist of a couple of source files of that language and a output.txt. When running the test, I'd compile the source files and then run its main method, checking that the output result would be equal to output.txt. All of this was automated, of course.

Now, dealing with this bigger compiler/bytecode-instrumentator, things are not so easy. It's no easy task to replicate what I've done with my simple compiler. I guess the way to go is to lean back from system tests at this stage, and focus on unit-tests.


As any compiler developer knows, a compiler consists of lots of visitors. I am not too sure on how to proceed with unit-testing them. From what I've seen, most of the visitors are calling a counterpart class that has methods related with that visitor (I guess the idea was to keep the SRP for the visitors).

There are a couple of techniques I can take to unit-test my compiler:

  1. Unit testing each one of the visitor's methods separately. This seems to be a good idea for a stackless visitor, but looks like a terrible idea for visitors that use one (or more) stacks. I then go about also unit-testing each of the other methods from standard(read, non-visitors) classes the traditional way.

  2. Unit testing the whole visitor in one go. That is, I create tree that I then visit. In the end, I verify if the symbol table was correctly updated or not. I do not care about mocking its dependencies.

  3. The same as 2), but now mocking the visitor's dependencies.

  4. What others?

I still have the problem that the unit-tests will be very tightly coupled with sabbleCC's AST (which tbh is really ugly).


We are currently not making any new tests, but I'd like to bring the train back on track, as I am sure that not testing the system is the same as feeding a monster that sooner or later will come back to bite us in the butt when we least expect it ;-(

Has anyone had any experience with compiler testing that could give some awwweeeesome advice on how to proceed now? I'm kinda lost here !

like image 718
devoured elysium Avatar asked Aug 01 '11 03:08

devoured elysium


1 Answers

I am involved in a project where a Java AST is translated into another language, OpenCL, using the Eclipse compiler, and have similar issues.

I have no magic solutions for you, but I'll share my experience in case it helps.

Your technique of testing with expected output (with output.txt) is how I started out as well, but it became an absolute maintenance nightmare for the tests. When I had to change the generator or the output for some reason (which happened a few times) I had to rewrite all the expected output files - and there were huge amounts of them. I started to not want to change output at all for fear of breaking all the tests (which was bad), but in the end I scrapped them and instead did testing on the resulting AST. This meant I could 'loosely' test the output. For example, if I wanted to test generation of if statements I could just find the one-and-only if statement in the generated class (I wrote helper methods to do all this common AST stuff), verify a few things about it, and be done. That test wouldn't care how the class was named or whether there were extra annotations or comments. This ended up working quite well as the tests were more focused. The disadvantage is that the tests were more tightly coupled to the code, so if I ever wanted to rip out the Eclipse compiler/AST library and use something else I'd need to rewrite all my tests. In the end because the code generation would change over time I was willing to pay that price.

I also heavily rely on integration tests - tests that actually compile and run the generated code in the target language. I had way more of these types of tests than unit tests purely because they seemed to be more useful and catch more problems.

As for visitor testing, again I do more integration-style testing with them - get a really small/specific Java source file, load it up with Eclipse compiler, run one of my visitors with it and check results. The only other way to test without invoking the Eclipse compiler would be to mock out an entire AST which was just not feasible - most of the visitors were non-trivial and required a fully constructed/valid Java AST as they would read annotations from main class. Most of the visitors were testable in this way because they either generated small OpenCL code fragments or built up a data structure which the unit tests could verify.

Yes, all my tests are very tightly coupled to the Eclipse compiler. But so is the actual software we are writing. Using anything else would mean we'd have to rewrite the whole program anyway so it's a price we're pretty happy to pay. I guess there is no one solution - you need to weigh up cost of tight coupling versus test maintainability/simplicity.

We also have a fair amount of testing utility code, such as setting up the Eclipse compiler with default settings, code to pull out the body nodes of method trees, etc. We try to keep the tests as small as possible (I know this is probably common sense but possibly worth mentioning).


(Edits/Additions below in responses to comments - easier to read/format than comment responses)

"I also heavily rely on integration tests - tests that actually compile and run the generated code in the target language" What did these tests actually do? How are they different than the output.txt tests?

(Edit again: After re-reading the question I realize our approaches are the same so ignore this)

Rather than just generate source code and compare that to expected output which I did initially, the integration tests generate OpenCL code, compile it and run it. All of the generated code produces output and that output is then compared.

For example, I have a Java class that, if the generator works properly, should generate OpenCL code that sums up values in two buffers and puts the value in a third buffer. Initially I would have written a text file with the expected OpenCL code and compared that in my test. Now, the integration test generates the code, runs it through the OpenCL compiler, runs it and the test then checks the values.

"As for visitor testing, again I do more integration-style testing with them - get a really small/specific Java source file, load it up with Eclipse compiler, run one of my visitors with it and check results. " Do you mean run with one of your visitors, or run all the visitors up to the visitor you wanna test?

Most of the visitors could be run independently of each other. Where possible I would run with only the visitor I am testing, or if there is a dependency on others, the minimal set of visitors required (usually just one other one was required). The visitors don't talk directly to each other, but use context objects that are passed around. These can be constructed artificially in the tests to get things into a known state.

Other question, do you use mocks -- at all, in this project? Moreover, do you regularly use mocks in other projects? I'm just trying to get a clear picture about the person I'm talking with :P

In this project we use mocks in about 5% of tests, probably even less. And I don't mock out any Eclipse compiler stuff.

The thing with mocks is that I'd need to understand what I'm mocking out well, and that is not the case with the Eclipse compiler. There are a whole lot of visitor methods that are called, and sometimes I'm not sure which one should be called (e.g. is visit ExtendedStringLiteral or visit StringLiteral called for string literals?) If I did mock this out and assumed one or the other, this might not correspond to reality and the program would fail even if the tests would pass - not desired. The only mocks we do are a couple for the annotation processor API, a couple of Eclipse compiler adapters, and some of our own core classes.

Other projects, such as Java EE stuff, more mocks were used, but I'm still not an avid user of them. The more defined, understood and predictable an API is the more likely I am to consider using mocks.

The first phases of our program are just like of a regular compiler. We extract info from the source files and we fill up a (big and complex!) symbol table. How would you go about system testing this? In theory, I could create a test with the source files and also a symbolTable.txt (or .xml or whatever) that contains all the info about the symbolTable, but that would, I think, be a bit complex to do. Each one of those integration tests would be a complex thing to accomplish!

I'd try to take the approach of testing small bits of the symbol table rather than the whole lot in one go. If I were testing whether a Java tree was built correctly, I'd have something like:

  • one test just for if statements:

    • have source code with one method containing one if statement
    • builds symboltable / tree from this source
    • pull out statement tree from only method body from main class (fail test if >1 or no method bodies, classes found, top-level statement nodes in method body)
    • compare if statement's node attributes (condition, body) programmatically
  • at least one test for each other kind of statement in a similar style.

  • other tests, maybe for multiple statements, etc. or whatever is needed

This approach is integration-style testing, but each integration test only tests a small part of the system.

Essentially I'd try to keep the tests as small as possible. A lot of the testing code for pulling out bits of the tree can be moved into utility methods to keep the test classes small.

I thought that maybe I could create a pretty printer that would take on the Symbol Table and output the correspondent source files (that, if everything was ok, would be just like the original source files). The problem is that the original files can have things in different order than what my pretty printer prints. I'm afraid that with this approach I might just be opening another can of worms. I've been relentless refactoring parts of the code and the bugs are starting to show off. I really need some integration tests to keep me on track.

That's exactly the approach I've taken. However in my system the order of stuff doesn't change much. I have generators that essentially output code in response to Java AST nodes, but there is a bit of freedom in that generators can call themselves recursively. For example, the 'if' generator that gets fired off in response to a Java If statement AST node can write out 'if (', then ask other generators to render the condition, then write ') {', ask other generators to write out the body, then write '}'.

like image 119
prunge Avatar answered Oct 01 '22 13:10

prunge