Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I use real or sample data for unit tests?

Tags:

unit-testing

I'm writing a parser for the output of a legacy application, and since there are no specs on the file syntax I've got as many samples of these files as I could.

Now I'm writing the unit tests before implementing the parser (because there is no other sane way to do this) but I'm not sure whether I should:

  • use the real files produced by the application, reading from them and comparing the output with the output that I would store in json format in another file.
  • or create a sample string with the tokens and possibilities I want to test and a dict (this is python) with the expected output.

I'm inclined to use the second alternative because I would test only what I need to, without all the "real-world" data included on the actual files, but I'm afraid I could forget to test for one possibility or another.

What do you think?

like image 408
Luiz Geron Avatar asked Nov 30 '10 19:11

Luiz Geron


People also ask

Should unit tests use real database?

Database access falls outside the scope of unit testing, so you would not write unit tests that include database access. You would include database access testing in your functional tests. Similarly, if you have a network app, you would not include network access in your unit tests.

Should you use mocks in unit tests?

It is unlikely for mocking to be applicable in unit tests, as that means there is a part of the system the unit depends on, making that unit less isolated and less subjected to unit testing. Whenever you reach out to mock things in a unit test that is a good sign you are in fact writing an integration test.


1 Answers

My suggestion is to do both. Write a set of integration tests that run through all the files you have with the expected outputs then unit test with your expected inputs to isolate the parsing logic.

I would recommend writing the integration tests first so you write your parser outside in, it might be disparaging to see a bunch of failing tests, but it'll help you isolate your edge cases earlier.

Btw, I think this is a great question. I recently came across something a similar problem which was transforming large xml feeds from an upstream system into a proprietary format. My solution was to write a set of integration black box tests for the full feeds testing things like record counts and other high level success metrics, then break down inputs into smaller and smaller chunks until I was able to test all the permutations of the data. It was only then that I had a good understanding of how to build the parser.

like image 126
jonnii Avatar answered Sep 29 '22 06:09

jonnii