This is a difficult and open-ended question I know, but I thought I'd throw it to the floor and see if anyone had any interesting suggestions.
I have developed a code-generator that takes our python interface to our C++ code (generated via SWIG) and generates code needed to expose this as WebServices. When I developed this code I did it using TDD, but I've found my tests to be brittle as hell. Because each test essentially wanted to verify that for a given bit of input code (which happens to be a C++ header) I'd get a given bit of outputted code I wrote a small engine that reads test definitions from XML input files and generates test cases from these expectations.
The problem is I dread going in to modify the code at all. That and the fact that the unit tests themselves are a: complex, and b: brittle.
So I'm trying to think of alternative approaches to this problem, and it strikes me I'm perhaps tackling it the wrong way. Maybe I need to focus more on the outcome, IE: does the code I generate actually run and do what I want it to, rather than, does the code look the way I want it to.
Has anyone got any experiences of something similar to this they would care to share?
I started writing up a summary of my experience with my own code generator, then went back and re-read your question and found you had already touched upon the same issues yourself, focus on the execution results instead of the code layout/look.
Problem is, this is hard to test, the generated code might not be suited to actually run in the environment of the unit test system, and how do you encode the expected results?
I've found that you need to break down the code generator into smaller pieces and unit test those. Unit testing a full code generator is more like integration testing than unit testing if you ask me.
Recall that "unit testing" is only one kind of testing. You should be able to unit test the internal pieces of your code generator. What you're really looking at here is system level testing (a.k.a. regression testing). It's not just semantics... there are different mindsets, approaches, expectations, etc. It's certainly more work, but you probably need to bite the bullet and set up an end-to-end regression test suite: fixed C++ files -> SWIG interfaces -> python modules -> known output. You really want to check the known input (fixed C++ code) against expected output (what comes out of the final Python program). Checking the code generator results directly would be like diffing object files...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With