Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

External Data File for Unit Tests

I'm a newbie to Unit Testing and I'm after some best practice advice. I'm coding in Cocoa using Xcode.

I've got a method that's validating a URL that a user enters. I want it to only accept http:// protocol and only accept URLs that have valid characters.

Is it acceptable to have one test for this and use a test data file? The data file provides example valid/invalid URLs and whether or not the URL should validate. I'm also using this to check the description and domain of the error message.

Why I'm doing this

I've read Pragmatic Unit Testing in Java with JUnit and this gives an example with an external data file, which makes me think this is OK. Plus it means I don't need to write lots of unit tests with very similar code just to test different data.

But on the other hand...

If I'm testing for:

  • invalid characters
  • and an invalid protocol
  • and valid URLs

all in the same test data file (and therefore in the same test) will this cause me problems later on? I read that one test should only fail for one reason.

Is what I'm doing OK?

How do other people use test data in their unit tests, if at all?

like image 733
John Gallagher Avatar asked Jul 04 '09 22:07

John Gallagher


2 Answers

In general, use a test data file only when it's necessary. There are a number of disadvantages to using a test data file:

  • The code for your test is split between the test code and the test data file. This makes the test more difficult to understand and maintain.
  • You want to keep your unit tests as fast as possible. Having tests that unnecessarily read data files can slow down your tests.

There are a few cases where I do use data files:

  • The input is large (for example, an XML document). While you could use String concatenation to create a large input, it can make the test code hard to read.
  • The test is actually testing code that reads a file. Even in this case, you might want to have the test write a sample file in a temporary directory so that all of the code for the test is in one place.

Instead of encoding the valid and invalid URLs in the file, I suggest writing the tests in code. I suggest creating a test for invalid characters, a test for invalid protocol(s), a test for invalid domain(s), and a test for a valid URL. If you don't think that has enough coverage, you can create a mini integration test to test multiple valid and invalid URLs. Here's an example in Java and JUnit:

public void testManyValidUrls() {
  UrlValidator validator = new UrlValidator();
  assertValidUrl(validator, "http://foo.com");
  assertValidUrl(validator, "http://foo.com/home");
  // more asserts here
}

private static void assertValidUrl(UrlValidator validator, String url) {
  assertTrue(url + " should be considered valid", validator.isValid(url);
}
like image 51
NamshubWriter Avatar answered Sep 28 '22 08:09

NamshubWriter


While I think this is a perfectly reasonable question to ask, I don't think you should be overly concerned about this. Strictly speaking, you are correct that each test should only test for one thing, but that doesn't preclude your use of a data file.

If your System Under Test (SUT) is a simple URL parser/validator, I assume that it takes a single URL as a parameter. As such, there's a limit to how much simultaneously invalid data you can feed into it. Even if you feed in an URL that contains both invalid characters, and an invalid protocol, it would only cause a single result (that the URL was invalid).

What you are describing is a Data-Driven Test (also called a Parameterized Test). If you keep the test itself simple, feeding it with different data is not problematic in itself.

What you do need to be concerned about is that you want to be able to quickly locate why a test fails when/if that happens some months from now. If your test output points to a specific row in you test data file, you should be able to quickly figure out what went wrong. On the other hand, if the only message you get is that the test failed and any of the rows in the file could be at fault, you will begin to see the contours of a test maintainability nightmare.

Personally, I lean slightly towards having the test data as closely associated with the tests as possible. That's because I view the concept of Tests as Executable Specifications as very important. When the test data is hard-coded within each test, it can very clearly specify the relationship between input and expected output. The more you remove the data from the test itself, the harder it becomes to read this 'specification'.

This means that I tend to define the values of input data within each test. If I have to write a lot of very similar tests where the only variation is input and/or expected output, I write a Parameterized Test, but still invoke that Parameterized Test from hard-coded tests (that each is only a single line of code). I don't think I've ever used an external data file.

But then again, these days, I don't even know what my input is, since I use Constrained Non-Determinism. Instead, I work with Equivalence Classes and Derived Values.

like image 24
Mark Seemann Avatar answered Sep 28 '22 07:09

Mark Seemann