Expensive maintenance with automated test data

Question

At my company, we have a growing set of integration test using JUnit in a Java Web application. Each test uses some specific external XMLs files to populate the database with the needed data to the test. The problem is:

When there is a change in the model, we take a long time to correct all XML files (we have hundreds of XML files, a lot of them with redundancy).
The complexity of creating an XML file manually discourages the programmer to explore different scenarios.
We don't have a link between the test data and the test (e.g. At the test I don't know the 'name' of the User inserted by the XML). We could hard code the information we need, but it would also increase the maintenance time to keep both XML and hard coded data synchronized.

Facing this problem, I started thinking in using the own system CRUD to generate test data for each test. At the beginning of each test I would run some methods to persist the desired data for the test. In my vision, it would resolve all the 3 problems since:

Changes to the model require changing the CRUD anyway, so it would take no longer to correct the test data.
It would be easier to build, test data because we would not worry about thing like matching id and foreign key of the entity manually.
I would have all the important data in the variables with the sync guaranteed by the IDE.

but, for me, it lacks experience and knowledge to start this approach. The question is: Is this solution effective? Does this approach cause other problems? Where I can find this approach in the literature? Is there a better solution to the listed problems?

Nathan Hughes · Accepted Answer

It sounds like your existing system uses something like DBUnit, where the tests start with a clean database, the test includes a setup step that loads data from one or more XML files into the database, then the test executes against that data.

Here are some of the benefits for this kind of approach:

If you do have a problem with the crud layer then that won't impact the data setup. When something goes wrong you should get one test failure per error, not one error for every related setup that fails.
Each test can be very explicit about exactly what data is needed to run the test. With a domain model sometimes between things like optional associations and lazy loading, what objects get loaded may not be certain. (Here I'm especially thinking of Hibernate where many times the consequences of a mapping may be complicated.) By contrast, if the data is setup in a more declarative way, stating what rows go in what table, the starting state is explicit.

Keeping tests simple, explicit, and minimally coupled to other parts means there's less to figure out and less to go wrong. If your tests get so complicated that any problem is less likely to be with the code under test than with the test, people will get discouraged from running and updating tests.

With DBUnit you can write a script to automate creating your XML from the database contents, so you can recreate the state you need and save it as XML. There shouldn't be any need to generate your test data manually.

It is possible for test data to become fragmented and hard to update, especially if it has been created in an ad-hoc fashion with no thought for reuse. You might consider going back through the tests and breaking up test setup data into pieces that you can reuse.

The pain points you describe don't seem to me like they require extreme measures like redoing all your test setups. Even if you do, you'll still want to refactor your test data. Maybe use a smaller project as a proving ground for bigger changes, and make small incremental changes to most of the existing code.

meriton · Answer

The key to improving maintainability is to keep DRY. Test data setup should not be redundant, and if you test technology offers no effective means of reuse, you are using the wrong technology.

Writing Java code for the test data setup gives you familiar and good tools to improve code reuse across tests. It also offers better refactoring support than XML, and makes the link between test data and test code explicit, because that's in the very same source file (or even the same method!). However, it does require tests to be written and maintained by programmers (not business analysts, managers, or testers that do not know Java).

Therefore, if test data is mostly authored and maintained by programmers, I'd do so in Java, through the CRUD layer (or even a full fledged domain layer) of the real application. If however most test data originates from some data export, or is authored by people that are not programmers, a purely data driven approach can be a better fit. It is also possible to combine these approaches (i.e. choose the most appropriate strategy for each entity).

Personal experience: Our team used to do integration tests with DBUnit, but have switched to setting up the test data as part of the test code by using our real data access layer. In so doing, our tests became more intention revealing and easier to maintain. Test effort was reduced, but test coverage was improved, and more tests got written with less prodding. This was possible because the tests were entirely written and maintained by developers.

Expensive maintenance with automated test data

Tags:

java

automated-tests

integration-testing

test-data

data-driven

André Queiroz

2 Answers

Nathan Hughes

meriton

Recent Activity

Donate For Us

Expensive maintenance with automated test data

Tags:

java

automated-tests

integration-testing

test-data

data-driven

André Queiroz

2 Answers

Nathan Hughes

meriton

Related questions

Recent Activity

Donate For Us