Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write good Unit Tests in Functional Programming

I'm using functions instead of classes, and I find that I can't tell when another function that it relies on is a dependency that should be individually unit-tested or an internal implementation detail that should not. How can you tell which one it is?

A little context: I'm writing a very simple Lisp interpreter which has an eval() function. It's going to have a lot of responsibilities, too many actually, such as evaluating symbols differently than lists (everything else evaluates to itself). When evaluating symbols, it has its own complex workflow (environment-lookup), and when evaluating lists, it's even more complicated, since the list can be a macro, function, or special-form, each of which have their own complex workflow and set of responsibilities.

I can't tell if my eval_symbol() and eval_list() functions should be considered internal implementation details of eval() which should be tested through eval()'s own unit tests, or genuine dependencies in their own right which should be unit-tested independently of eval()'s unit tests.

like image 392
Steven Avatar asked Aug 06 '11 13:08

Steven


People also ask

What makes a unit test for a function good?

Good unit tests are independent and isolated They test one thing at a time, ideally with one assertion. They don't cause side effects. They certainly don't rely on side effects. You can run them in any order and they still pass.

How should unit tests be written?

A typical unit test contains 3 phases: First, it initializes a small piece of an application it wants to test (also known as the system under test, or SUT), then it applies some stimulus to the system under test (usually by calling a method on it), and finally, it observes the resulting behavior.


2 Answers

A significant motivation for the "unit test" concept is to control the combinatorial explosion of required test cases. Let's look at the examples of eval, eval_symbol and eval_list.

In the case of eval_symbol, we will want to test contingencies where the symbol's binding is:

  • missing (i.e. the symbol is unbound)

  • in the global environment

  • is directly within the current environment

  • inherited from a containing environment

  • shadowing another binding

  • ... and so on

In the case of eval_list, we will want to test (among other things) what happens when the list's function position contains a symbol with:

  • no function or macro binding

  • a function binding

  • a macro binding

eval_list will invoke eval_symbol whenever it needs a symbol's binding (assuming a LISP-1, that is). Let's say that there are S test cases for eval_symbol and L symbol-related test cases for eval_list. If we test each of these functions separately, we could get away with roughly S + L symbol-related test cases. However, if we wish to treat eval_list as a black box and to test it exhaustively without any knowledge that it uses eval_symbol internally, then we are faced with S x L symbol-related test cases (e.g. global function binding, global macro binding, local function binding, local macro binding, inherited function binding, inherited macro binding, and so on). That's a lot more cases. eval is even worse: as a black box the number of combinations can become incredibly large -- hence the term combinatorial explosion.

So, we are faced with a choice of theoretical purity versus actual practicality. There is no doubt that a comprehensive set of test cases that exercises only the "public API" (in this case, eval) gives the greatest confidence that there are no bugs. After all, by exercising every possible combination we may turn up subtle integration bugs. However, the number of such combinations may be so prohibitively large as to preclude such testing. Not to mention that the programmer will probably make mistakes (or go insane) reviewing vast numbers of test cases that only differ in subtle ways. By unit-testing the smaller internal components, one can vastly reduce the number of required test cases while still retaining a high level of confidence in the results -- a practical solution.

So, I think the guideline for identifying the granularity of unit testing is this: if the number of test cases is uncomfortably large, start looking for smaller units to test.

In the case at hand, I would absolutely advocate testing eval, eval-list and eval-symbol as separate units precisely because of the combinatorial explosion. When writing the tests for eval-list, you can rely upon eval-symbol being rock solid and confine your attention to the functionality that eval-list adds in its own right. There are likely other testable units within eval-list as well, such as eval-function, eval-macro, eval-lambda, eval-arglist and so on.

like image 180
WReach Avatar answered Oct 10 '22 06:10

WReach


My advice is quite simple: "Start somewhere!"

  • If you see a name of some def (or deffun) that looks like it might be fragile, well, you probably want to test it, don't you?
  • If you're having some trouble trying to figure out how your client code can interface with some other code unit, well, you probably want to write some tests somewhere that let you create examples of how to properly use that function.
  • If some function seems sensitive to data values, well, you might want to write some tests that not only verify it can handle any reasonable inputs properly, but also specifically exercise boundary conditions and odd or unusual data inputs.
  • Whatever seems bug-prone should have tests.
  • Whatever seems unclear should have tests.
  • Whatever seems complicated should have tests.
  • Whatever seems important should have tests.

Later, you can go about increasing your coverage to 100%. But you'll find that you will probably get 80% of your real results from the first 20% of your unit test coding (Inverted "Law of the Critical Few").

So, to review the main point of my humble approach, "Start somewhere!"

Regarding the last part of your question, I would recommend you think about any possible recursion or any additional possible reuse by "client" functions that you or subsequent developers might create in the future that would also call eval_symbol() or eval_list().

Regarding recursion, the functional programming style uses it a lot and it can be difficult to get right, especially for those of us who come from procedural or object-oriented programming, where recursion seems rarely encountered. The best way to get recursion right is to precisely target any recursive features with unit tests to make certain all possible recursive use cases are validated.

Regarding reuse, if your functions are likely to be invoked by anything other than a single use by your eval() function, they should probably be treated as genuine dependencies that deserve independent unit tests.

As a final hint, the term "unit" has a technical definition in the domain of unit testing as "the smallest piece of code software that can be tested in isolation.". That is a very old fundamental definition that may quickly clarify your situation for you.

like image 37
John Tobler Avatar answered Oct 10 '22 05:10

John Tobler