Say I'm trying to test a simple Set class <pre class="prettyprint"><code>public IntSet : IEnumerable<int> { Add(int i) {...} //IEnumerable implementation... } </code></pre> And suppose I'm trying to test that no duplicate values can exist in the set. My first option is to insert some sample data into the set, and test for duplicates using my knowledge of the data I used, for example: <pre class="prettyprint"><code> //OPTION 1 void InsertDuplicateValues_OnlyOneInstancePerValueShouldBeInTheSet() { var set = new IntSet(); //3 will be added 3 times var values = new List<int> {1, 2, 3, 3, 3, 4, 5}; foreach (int i in values) set.Add(i); //I know 3 is the only candidate to appear multiple times int counter = 0; foreach (int i in set) if (i == 3) counter++; Assert.AreEqual(1, counter); } </code></pre> My second option is to test for my condition generically: <pre class="prettyprint"><code> //OPTION 2 void InsertDuplicateValues_OnlyOneInstancePerValueShouldBeInTheSet() { var set = new IntSet(); //The following could even be a list of random numbers with a duplicate var values = new List<int> { 1, 2, 3, 3, 3, 4, 5}; foreach (int i in values) set.Add(i); //I am not using my prior knowledge of the sample data //the following line would work for any data CollectionAssert.AreEquivalent(new HashSet<int>(values), set); } </code></pre> Of course, in this example, I conveniently have a set implementation to check against, as well as code to compare collections (CollectionAssert). But what if I didn't have either ? This code would be definitely more complicated than that of the previous option! And this is the situation when you are testing your real life custom business logic. Granted, testing for expected conditions generically covers more cases - but it becomes very similar to implementing the logic again (which is both tedious and useless - you can't use the same code to check itself!). Basically I'm asking whether my tests should look like "insert 1, 2, 3 then check something about 3" or "insert 1, 2, 3 and check for something in general" EDIT - To help me understand, please state in your answer if you prefer OPTION 1 or OPTION 2 (or neither, or that it depends on the case, etc). Just to clarify, it's pretty clear that in this case (<code>IntSet</code>), option 2 is better in all aspects. However, my question pertains to the cases where you don't have an alternative implementation to check against, so the code in option 2 would be definitely more complicated than option 1.

I usually prefer to test use cases one by one - this works nicely the TDD manner: "code a little, test a little". Of course, after a while my test cases start to contain duplicated code, so I refactor. The actual method of verifying the results does not matter to me as long as it is working for sure, and doesn't get into the way of testing itself. So if there is a "reference implementation" to test against, it is all the better. An important thing, however, is that the tests should be reproducable and it should be clear what each test method is actually testing. To me, inserting random values into a collection is neither - of course if there is a huge amount of data/use cases involved, every tool or approach is welcome which helps to handle the situation better without lulling me into a false sense of security.

<blockquote> Basically I'm asking whether my tests should look like "insert 1, 2, 3 then check something about 3" or "insert 1, 2, 3 and check for something in general" </blockquote> I am not a TDD purist but it seems people are saying that the test should break if the condition that you are trying to test is broken. e.i. if you implement a test which checks a general condition, then your test will break in more than a few cases so it is not optimal. If I am testing for not being able to add duplicates, then I would only test that. So in this case, I would say I would go with first. <h3>(Update)</h3> OK, now you have updated the code and I need to update my answer. Which one would I choose? It depends on the implementation of <code>CollectionAssert.AreEquivalent(new HashSet<int>(values), set);</code>. For example, <code>IEnumerable<T></code> does keep the order while <code>HashSet<T></code> does not so even this could break the test while it should not. For me first is still superior.

Unit Testing - Algorithm or Sample based?

Tags:

language-agnostic

c#

unit-testing

Say I'm trying to test a simple Set class

public IntSet : IEnumerable<int>
{
    Add(int i) {...}
    //IEnumerable implementation...
}

And suppose I'm trying to test that no duplicate values can exist in the set. My first option is to insert some sample data into the set, and test for duplicates using my knowledge of the data I used, for example:

    //OPTION 1
    void InsertDuplicateValues_OnlyOneInstancePerValueShouldBeInTheSet()
    {
        var set = new IntSet();

        //3 will be added 3 times
        var values = new List<int> {1, 2, 3, 3, 3, 4, 5};
        foreach (int i in values)
            set.Add(i);

        //I know 3 is the only candidate to appear multiple times
        int counter = 0;
        foreach (int i in set)
            if (i == 3) counter++;

        Assert.AreEqual(1, counter);
    }

My second option is to test for my condition generically:

    //OPTION 2
    void InsertDuplicateValues_OnlyOneInstancePerValueShouldBeInTheSet()
    {
        var set = new IntSet();

        //The following could even be a list of random numbers with a duplicate
        var values = new List<int> { 1, 2, 3, 3, 3, 4, 5};
        foreach (int i in values)
            set.Add(i);

        //I am not using my prior knowledge of the sample data 
        //the following line would work for any data
        CollectionAssert.AreEquivalent(new HashSet<int>(values), set);
    }

Of course, in this example, I conveniently have a set implementation to check against, as well as code to compare collections (CollectionAssert). But what if I didn't have either ? This code would be definitely more complicated than that of the previous option! And this is the situation when you are testing your real life custom business logic.

Granted, testing for expected conditions generically covers more cases - but it becomes very similar to implementing the logic again (which is both tedious and useless - you can't use the same code to check itself!). Basically I'm asking whether my tests should look like "insert 1, 2, 3 then check something about 3" or "insert 1, 2, 3 and check for something in general"

EDIT - To help me understand, please state in your answer if you prefer OPTION 1 or OPTION 2 (or neither, or that it depends on the case, etc). Just to clarify, it's pretty clear that in this case (IntSet), option 2 is better in all aspects. However, my question pertains to the cases where you don't have an alternative implementation to check against, so the code in option 2 would be definitely more complicated than option 1.

863

asked Jan 17 '11 13:01

Ohad Schneider

4 Answers

I usually prefer to test use cases one by one - this works nicely the TDD manner: "code a little, test a little". Of course, after a while my test cases start to contain duplicated code, so I refactor. The actual method of verifying the results does not matter to me as long as it is working for sure, and doesn't get into the way of testing itself. So if there is a "reference implementation" to test against, it is all the better.

An important thing, however, is that the tests should be reproducable and it should be clear what each test method is actually testing. To me, inserting random values into a collection is neither - of course if there is a huge amount of data/use cases involved, every tool or approach is welcome which helps to handle the situation better without lulling me into a false sense of security.

111

answered Sep 20 '22 22:09

Péter Török

If you have an alternative implementation, then definitely use it.

In some situations, you can avoid reimplementing an alternative implementation, but still test the functionality in general. For instance, in your example, you could first generate a set of unique values, and then randomly duplicate elements before passing it to your implementation. You can test that the output is equivalent to your starting vector, without having to reimplement the sort.

I try to take this approach whenever it's feasible.

Update: Essentially, I'm advocating the OP's "Option #2". With this approach, there's precisely one output vector that will allow the test to pass. With "Option #1", there's an infinite number of acceptable output vectors (it's testing an invariant, but it's not testing for any relationship to the input data).

answered Sep 21 '22 22:09

Oliver Charlesworth

Basically I'm asking whether my tests should look like "insert 1, 2, 3 then check something about 3" or "insert 1, 2, 3 and check for something in general"

I am not a TDD purist but it seems people are saying that the test should break if the condition that you are trying to test is broken. e.i. if you implement a test which checks a general condition, then your test will break in more than a few cases so it is not optimal.

If I am testing for not being able to add duplicates, then I would only test that. So in this case, I would say I would go with first.

(Update)

OK, now you have updated the code and I need to update my answer.

Which one would I choose? It depends on the implementation of CollectionAssert.AreEquivalent(new HashSet<int>(values), set);. For example, IEnumerable<T> does keep the order while HashSet<T> does not so even this could break the test while it should not. For me first is still superior.

answered Sep 20 '22 22:09

Aliostad

According to xUnit Test Patterns, it's usually more favorable to test the state of the system under test. If you want to test its behavior and the way in which the algorithm operates, you can use Mock Object Testing.

That being said, both of your tests are known as Data Driven Tests. What is usually acceptable is to use as much knowledge as the API provides. Remember, those tests also serve as documentation for your software. Therefore it's critical to keep them as simple as possible - whatever that means for your specific case.

answered Sep 20 '22 22:09

Mike

Related questions
                            
                                What if DirectoryInfo.GetFiles().Length exceeds Int32.MaxValue?
                            
                                How do I create an SRV record in DNS with C#
                            
                                Set InnerText with HtmlAgilityPack
                            
                                Fast data recording/logging on a separate thread in C#
                            
                                URL's not being resolved when in UserControl (ASP.NET)
                            
                                filesystemwatcher as windows service?
                            
                                What resources do blocked threads take-up
                            
                                Using itextsharp (or any c# pdf library), how to open a PDF, replace some text, and save it again?
                            
                                Public Transportation using Buses in City
                            
                                Transfer file using MSTSC in Command line
                            
                                .NET C# Multithreading
                            
                                How can I cancel an asynchronous delegate in C# 3.5?
                            
                                How can I get the layers from a PSD file?
                            
                                How to skip the function with lambda code inside?
                            
                                Get the sizeof a struct given the System.Type
                            
                                How do you test performance of code between software release versions?
                            
                                How to tell thread-pool to run a delegate on a `STA` thread?
                            
                                Creating new sql server table with c#
                            
                                Diff applications going crazy if the functions in the file were reordered
                            
                                Getting error 400 / 404 - HttpUtility.UrlEncode not encoding full string?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Unit Testing - Algorithm or Sample based?

Tags:

language-agnostic

c#

unit-testing

Ohad Schneider

People also ask

4 Answers

Péter Török

Oliver Charlesworth

(Update)

Aliostad

Mike

Recent Activity

Donate For Us