What is considered the best approach to unit test a complex unit such as a compiler?
I've written a few compilers and interpreters over the years, and I do find this kind of code quite hard to test in a good way.
If we take something like the Abstract Syntax Tree generation. how would you test this using TDD?
Small constructs might be easy to test. e.g. something along the lines of:
string code = @"public class Foo {}";
AST ast = compiler.Parse(code);
Since that won't generate alot of ast nodes.
But if I actually want to test that the compiler can generate an AST for something like a method:
[TestMethod]
public void Can_parse_integer_instance_method_in_class ()
{
string code = @"public class Foo { public int method(){ return 0;}}";
AST ast = compiler.Parse(code);
What would you assert on? Manually defining an AST that represents the given code and make an assertion that the generated AST conforms to the manually defined AST seems horribly combersome and might even be error prone.
So what are the best tactics for TDD'ing complex scenarios like this?
Firstly, if you test a compiler, you cannot get enough tests! Users really rely on compiler-generated output as if it were an always-golden-standard, so really be aware of quality. So if you can, test with every test you can come up with!
Secondly, use all testing methods available, yet use them where appropriate. Indeed, you may be able to mathematically prove, that a certain transformation is correct. If you are able to do so, you should.
But every compiler that I've seen some internals of involves heuristics and a lot of optimized, hand-crafted code at its internals; thus assisted proving methods typically are not applicable any more. Here, testing comes in place, and I mean a lot of it!
When collecting tests, please consider different cases:
Now, this sounds like an awful lot of work! Yes it does, but there is help: there are several, commercial test-suites for (C-) compilers out in the world and experts that might help you in applying them. Here a small list of those known to me:
First of all parsing is usually a trivial part of compiler project. From my experience it never takes more than 10% of the time (unless we are talking about C++ but you wouldn't be asking questions here if you were designing it) so you'd rather not invest much of your time into parser tests.
Still, TDD (or however you call it) has it's share in developing the middle-end where you often want to verify that e.g. optimizations that you've just added actually did result in expected code transformation. From my experience, tests like this are usually implemented by giving compiler specially crafted test programs and grepping output assembly for expected patterns (was this loop unrolled four times? did we manage to avoid memory writes is this function? etc.). Grepping assembly isn't as good as analyzing structured representation (S-exprs or XML) but it's cheap and works fine in most cases. It's awfully hard to support as your compiler grows though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With