Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to correctly achieve test isolation with a stateful Python module?

The project I'm working on is a business logic software wrapped up as a Python package. The idea is that various script or application will import it, initialize it, then use it.

It currently has a top level init() method that does the initialization and sets up various things, a good example is that it sets up SQLAlchemy with a db connection and stores the SA session for later access. It is being stored in a subpackage of my project (namely myproj.model.Session, so other code could get a working SA session after import'ing the model).

Long story short, this makes my package a stateful one. I'm writing unit tests for the project and this stafeful behaviour poses some problems:

  1. tests should be isolated, but the internal state of my package breaks this isolation
  2. I cannot test the main init() method since its behavior depends on the state
  3. future tests will need to be run against the (not yet written) controller part with a well known model state (eg. a pre-populated sqlite in-memory db)

Should I somehow refactor my package because the current structure is not the Best (possible) Practice(tm)? :)

Should I leave it at that and setup/teardown the whole thing every time? If I'm going to achieve complete isolation that'd mean fully erasing and re-populating the db at every single test, isn't that overkill?

This question is really on the overall code & tests structure, but for what it's worth I'm using nose-1.0 for my tests. I know the Isolate plugin could probably help me but I'd like to get the code right before doing strange things in the test suite.

like image 926
Luke404 Avatar asked Apr 15 '11 10:04

Luke404


2 Answers

You have a few options:

Mock the database

There are a few trade offs to be aware of.

Your tests will become more complex as you will have to do the setup, teardown and mocking of the connection. You may also want to do verification of the SQL/commands sent. It also tends to create an odd sort of tight coupling which may cause you to spend additonal time maintaining/updating tests when the schema or SQL changes.

This is usually the purest for of test isolation because it reduces a potentially large dependency from testing. It also tends to make tests faster and reduces the overhead to automating the test suite in say a continuous integration environment.

Recreate the DB with each Test

Trade offs to be aware of.

This can make your test very slow depending on how much time it actually takes to recreate your database. If the dev database server is a shared resource there will have to be additional initial investment in making sure each dev has their own db on the server. The server may become impacted depending on how often tests get runs. There is additional overhead to running your test suite in a continuous integration environment because it will need at least, possibly more dbs (depending on how many branches are being built simultaneously).

The benefit has to do with actually running through the same code paths and similar resources that will be used in production. This usually helps to reveal bugs earlier which is always a very good thing.

ORM DB swap

If your using an ORM like SQLAlchemy their is a possibility that you can swap the underlying database with a potentially faster in-memory database. This allows you to mitigate some of the negatives of both the previous options.

It's not quite the same database as will be used in production, but the ORM should help mitigate the risk that obscures a bug. Typically the time to setup an in-memory database is much shorter that one which is file-backed. It also has the benefit of being isolated to the current test run so you don't have to worry about shared resource management or final teardown/cleanup.

like image 105
dietbuddha Avatar answered Oct 04 '22 21:10

dietbuddha


Working on a project with a relatively expensive setup (IPython), I've seen an approach used where we call a get_ipython function, which sets up and returns an instance, while replacing itself with a function which returns a reference to the existing instance. Then every test can call the same function, but it only does the setup for the first one.

That saves doing a long setup procedure for every test, but occasionally it creates odd cases where a test fails or passes depending on what tests were run before. We have ways of dealing with that - a lot of the tests should do the same thing regardless of the state, and we can try to reset the object's state before certain tests. You might find a similar trade-off works for you.

like image 39
Thomas K Avatar answered Oct 04 '22 20:10

Thomas K