Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unit Testing - Algorithm or Sample based?

Say I'm trying to test a simple Set class

public IntSet : IEnumerable<int>
{
    Add(int i) {...}
    //IEnumerable implementation...
}

And suppose I'm trying to test that no duplicate values can exist in the set. My first option is to insert some sample data into the set, and test for duplicates using my knowledge of the data I used, for example:

    //OPTION 1
    void InsertDuplicateValues_OnlyOneInstancePerValueShouldBeInTheSet()
    {
        var set = new IntSet();

        //3 will be added 3 times
        var values = new List<int> {1, 2, 3, 3, 3, 4, 5};
        foreach (int i in values)
            set.Add(i);

        //I know 3 is the only candidate to appear multiple times
        int counter = 0;
        foreach (int i in set)
            if (i == 3) counter++;

        Assert.AreEqual(1, counter);
    }

My second option is to test for my condition generically:

    //OPTION 2
    void InsertDuplicateValues_OnlyOneInstancePerValueShouldBeInTheSet()
    {
        var set = new IntSet();

        //The following could even be a list of random numbers with a duplicate
        var values = new List<int> { 1, 2, 3, 3, 3, 4, 5};
        foreach (int i in values)
            set.Add(i);

        //I am not using my prior knowledge of the sample data 
        //the following line would work for any data
        CollectionAssert.AreEquivalent(new HashSet<int>(values), set);
    } 

Of course, in this example, I conveniently have a set implementation to check against, as well as code to compare collections (CollectionAssert). But what if I didn't have either ? This code would be definitely more complicated than that of the previous option! And this is the situation when you are testing your real life custom business logic.

Granted, testing for expected conditions generically covers more cases - but it becomes very similar to implementing the logic again (which is both tedious and useless - you can't use the same code to check itself!). Basically I'm asking whether my tests should look like "insert 1, 2, 3 then check something about 3" or "insert 1, 2, 3 and check for something in general"

EDIT - To help me understand, please state in your answer if you prefer OPTION 1 or OPTION 2 (or neither, or that it depends on the case, etc). Just to clarify, it's pretty clear that in this case (IntSet), option 2 is better in all aspects. However, my question pertains to the cases where you don't have an alternative implementation to check against, so the code in option 2 would be definitely more complicated than option 1.

like image 863
Ohad Schneider Avatar asked Jan 17 '11 13:01

Ohad Schneider


People also ask

Which method is used for unit testing?

Unit Testing Techniques: Black Box Testing - Using which the user interface, input and output are tested. White Box Testing - used to test each one of those functions behaviour is tested. Gray Box Testing - Used to execute tests, risks and assessment methods.

What are the two types of unit testing techniques?

There are 2 types of Unit Testing: Manual, and Automated.

What is unit testing why we use it and sample test cases?

Unit testing allows the programmer to refactor code at a later date, and make sure the module still works correctly (i.e. Regression testing). The procedure is to write test cases for all functions and methods so that whenever a change causes a fault, it can be quickly identified and fixed.

What is unit testing and its types?

Unit testing is a type of testing in which individual units or functions of software testing. Its primary purpose is to test each unit or function. A unit is the smallest testable part of an application. It mainly has one or a few inputs and produces a single output.


4 Answers

I usually prefer to test use cases one by one - this works nicely the TDD manner: "code a little, test a little". Of course, after a while my test cases start to contain duplicated code, so I refactor. The actual method of verifying the results does not matter to me as long as it is working for sure, and doesn't get into the way of testing itself. So if there is a "reference implementation" to test against, it is all the better.

An important thing, however, is that the tests should be reproducable and it should be clear what each test method is actually testing. To me, inserting random values into a collection is neither - of course if there is a huge amount of data/use cases involved, every tool or approach is welcome which helps to handle the situation better without lulling me into a false sense of security.

like image 111
Péter Török Avatar answered Sep 20 '22 22:09

Péter Török


If you have an alternative implementation, then definitely use it.

In some situations, you can avoid reimplementing an alternative implementation, but still test the functionality in general. For instance, in your example, you could first generate a set of unique values, and then randomly duplicate elements before passing it to your implementation. You can test that the output is equivalent to your starting vector, without having to reimplement the sort.

I try to take this approach whenever it's feasible.

Update: Essentially, I'm advocating the OP's "Option #2". With this approach, there's precisely one output vector that will allow the test to pass. With "Option #1", there's an infinite number of acceptable output vectors (it's testing an invariant, but it's not testing for any relationship to the input data).

like image 44
Oliver Charlesworth Avatar answered Sep 21 '22 22:09

Oliver Charlesworth


Basically I'm asking whether my tests should look like "insert 1, 2, 3 then check something about 3" or "insert 1, 2, 3 and check for something in general"

I am not a TDD purist but it seems people are saying that the test should break if the condition that you are trying to test is broken. e.i. if you implement a test which checks a general condition, then your test will break in more than a few cases so it is not optimal.

If I am testing for not being able to add duplicates, then I would only test that. So in this case, I would say I would go with first.

(Update)

OK, now you have updated the code and I need to update my answer.

Which one would I choose? It depends on the implementation of CollectionAssert.AreEquivalent(new HashSet<int>(values), set);. For example, IEnumerable<T> does keep the order while HashSet<T> does not so even this could break the test while it should not. For me first is still superior.

like image 36
Aliostad Avatar answered Sep 20 '22 22:09

Aliostad


According to xUnit Test Patterns, it's usually more favorable to test the state of the system under test. If you want to test its behavior and the way in which the algorithm operates, you can use Mock Object Testing.

That being said, both of your tests are known as Data Driven Tests. What is usually acceptable is to use as much knowledge as the API provides. Remember, those tests also serve as documentation for your software. Therefore it's critical to keep them as simple as possible - whatever that means for your specific case.

like image 23
Mike Avatar answered Sep 20 '22 22:09

Mike