Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I test (duplicate) data, or only the behavior?

From the design perspective, I am wondering should I test the data, especially if it's a generally known data (not something very configurable) - this can apply to things like popular file extensions, special IP addresses etc.

Suppose we have a emergency phone number classifier:

public class ContactClassifier {

    public final static String EMERGENCY_PHONE_NUMBER = "911";

    public boolean isEmergencyNumber(String number) {
        return number.equals(EMERGENCY_PHONE_NUMBER);
    }
}

Should I test it this way ("911" duplication):

@Test
public testClassifier() {
    assertTrue(contactClassifier.isEmergencyNumber("911"));
    assertFalse(contactClassifier.isEmergencyNumber("111-other-222"));
}

or (test if properly recognized "configured" number):

@Test
public testClassifier() {      
    assertTrue(contactClassifier.isEmergencyNumber(ContactClassifier.EMERGENCY_PHONE_NUMBER));
    assertFalse(contactClassifier.isEmergencyNumber("111-other-222"));
}

or inject "911" in the constructor,which looks the most reasonable for me, but even if I do so - should I wrote a test for the "application glue" if the component was instantiated with proper value? If someone can do a typo in data (code), then I see no reasons someone can do a typo in tests case (I bet such data would be copy-paste)

like image 423
Piotr Müller Avatar asked Nov 13 '17 09:11

Piotr Müller


People also ask

Why should we avoid data duplication?

Limitations of data redundancyCreates opportunities for data corruption: Data corruption occurs when something damages or causes errors in the information during the storage, transfer, or creation process. This means that storing several copies of the same data can create more opportunities for its corruption.

What is the disadvantage of duplicate information in a database?

The cost of data duplicationPoor customer service. Irritating customers with duplicated messages leading to a poor company image. Increased marketing costs. Missed opportunities.

How do you check for duplicate data sets?

If you want to identify duplicates across the entire data set, then select the entire set. Navigate to the Home tab and select the Conditional Formatting button. In the Conditional Formatting menu, select Highlight Cells Rules. In the menu that pops up, select Duplicate Values.


2 Answers

What is the point in test data that you can test? That constant value is in fact constant value? It's already defined in code. Java makes sure that the value is in fact the value so don't bother.

What you should do in unit test is test implementation, if it's correct or not. To test incorrect behaviour you use data defined inside test, marked as wrong, and send to method. To test that data is correct you input it during test, if it's border values that are not well known, or use application wide known values (constants inside interfaces) if they're defined somewhere already.

What is bothering you is that the data, that should be well known to everyone) is placed in test and that is not correct at all. What you can do is to move it to interface level. This way, by design, you have your application known data designed to be part of contract and it's correctness checked by java compiler.

Values that are well known should not be checked but should be handled by interfaces of some sort to maintain them. Changing it is easy, yes, and your test will not fail during that change, but to avoid accidents with it you should have merge request, reviews and tasks that are associated with them. If someone does change it by accident you can find that at the code review. If you commit everything to master you have bigger problems than constants doubly defined.

Now, onto parts that are bothering you in other approaches:

1) If someone can do a typo in data (code), then I see no reasons someone can do a typo in tests case (I bet such data would be copy-paste)

Actually, if someone changes values in data and then continues to develop, at some point he will run clean-install and see those failed tests. At that point he will probably change/ignore test to make it pass. If you have person that changes data so randomly you have bigger issues, and if not and the change is defined by task - you made someone do the change twice (at least?). No pros and many cons.

2) Worrying about someone making a mistake is generally bad practice. You can't catch it using code. Code reviews are designed for that. You can worry though about someone not correctly using the interface you defined.

3) Should I test it this way:

@Test
public testClassifier() {
    assertTrue(contactClassifier.isEmergencyNumber(ContactClassifier.EMERGENCY_PHONE_NUMBER));
    assertFalse(contactClassifier.isEmergencyNumber("111-other-222"));
}

Also not this way. This is not test but test batch, i.e. multiple tests in the same method. It should be this way (convention-s):

@Test
public testClassifier_emergencyNumberSupplied_correctnessConfirmed() {
    assertTrue(contactClassifier.isEmergencyNumber(ContactClassifier.EMERGENCY_PHONE_NUMBER));
}


@Test
public testClassifier_incorrectValueSupplied_correctnessNotConfirmed() {
    assertFalse(contactClassifier.isEmergencyNumber("111-other-222"));
}

4) it's not necessary when method is properly named, but if it's long enough you might consider naming the values inside test. For example

@Test
public testClassifier_incorrectValueSupplied_correctnessNotConfirmed() {
    String nonEmergencyNumber = "111-other-222";
    assertFalse(contactClassifier.isEmergencyNumber(nonEmergencyNumber));
}
like image 56
Andrii Plotnikov Avatar answered Sep 29 '22 03:09

Andrii Plotnikov


External constants as such have a problem. The import disappears and the constant is added to the class' constant pool. Hence when in the future the constant is changed in the original class, the compiler does not see a dependency between the .class files, and leaves the old constant value in the test class.

So you would need a clean build.

Furthermore tests should be short, clear to read and fast to write. Tests deal with concrete cases of data. Abstractions are counter-productive, and may even lead to errors in the test themselves. Constants (like a speed limit) should be etched in stone, should be literals. Value properties like the maximum velocity of a car brand can stem from some kind of table lookup.

Of course repeated values could be placed in local constants. Prevents typos, easy - as local - abstraction, clarifies the semantic meaning of a value.

However as cases in general will use constants maybe twice or three times (positive and negative test), I would go for bare constants.

like image 21
Joop Eggen Avatar answered Sep 29 '22 03:09

Joop Eggen