Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unit testing large blocks of code (mappings, translation, etc)

We unit test most of our business logic, but are stuck on how best to test some of our large service tasks and import/export routines. For example, consider the export of payroll data from one system to a 3rd party system. To export the data in the format the company needs, we need to hit ~40 tables, which creates a nightmare situation for creating test data and mocking out dependencies.

For example, consider the following (a subset of ~3500 lines of export code):

public void ExportPaychecks()
{
   var pays = _pays.GetPaysForCurrentDate();
   foreach (PayObject pay in pays)
   {
      WriteHeaderRow(pay);
      if (pay.IsFirstCheck)
      {
         WriteDetailRowType1(pay);
      }
   }
}

private void WriteHeaderRow(PayObject pay)
{
   //do lots more stuff
}

private void WriteDetailRowType1(PayObject pay)
{
   //do lots more stuff
}

We only have the one public method in this particular export class - ExportPaychecks(). That's really the only action that makes any sense to someone calling this class ... everything else is private (~80 private functions). We could make them public for testing, but then we'd need to mock them to test each one separately (i.e. you can't test ExportPaychecks in a vacuum without mocking the WriteHeaderRow function. This is a huge pain too.

Since this is a single export, for a single vendor, moving logic into the Domain doesn't make sense. The logic has no domain significance outside of this particular class. As a test, we built out unit tests which had close to 100% code coverage ... but this required an insane amount of test data typed into stub/mock objects, plus over 7000 lines of code due to stubbing/mocking our many dependencies.

As a maker of HRIS software, we have hundreds of exports and imports. Do other companies REALLY unit test this type of thing? If so, are there any shortcuts to make it less painful? I'm half tempted to say "no unit testing the import/export routines" and just implement integration testing later.

Update - thanks for the answers all. One thing I'd love to see is an example, as I'm still not seeing how someone can turn something like a large file export into an easily testable block of code without turning the code into a mess.

like image 967
Andrew Avatar asked Jan 16 '10 06:01

Andrew


People also ask

What is unit testing of code?

Unit testing is a software development process in which the smallest testable parts of an application, called units, are individually and independently scrutinized for proper operation. This testing methodology is done during the development process by the software developers and sometimes QA staff.

What should be tested in unit testing?

The purpose of a unit test in software engineering is to verify the behavior of a relatively small piece of software, independently from other parts. Unit tests are narrow in scope, and allow us to cover all cases, ensuring that every single part works correctly.

What is a unit test for ETL?

Unit Testing ETL Pipelines Unit tests are small tests that, typically, test business logic. A unit test checks that a line of code or set of lines of code do one thing. They don't prove whether a pipeline works, not even close but that is fine – we have other tests for that.

What are the two types of unit testing techniques?

There are 2 types of Unit Testing: Manual, and Automated.


2 Answers

This style of (attempted) unit testing where you try to cover an entire huge code base through a single public method always reminds me of surgeons, dentists or gynaecologists whe have perform complex operations through small openings. Possible, but not easy.

Encapsulation is an old concept in object-oriented design, but some people take it to such extremes that testability suffers. There's another OO principle called the Open/Closed Principle that fits much better with testability. Encapsulation is still valuable, but not at the expense of extensibility - in fact, testability is really just another word for the Open/Closed Principle.

I'm not saying that you should make your private methods public, but what I am saying is that you should consider refactoring your application into composable parts - many small classes that collaborate instead of one big Transaction Script. You may think it doesn't make much sense to do this for a solution to a single vendor, but right now you are suffering, and this is one way out.

What will often happen when you split up a single method in a complex API is that you also gain a lot of extra flexibility. What started out as a one-off project may turn into a reusable library.


Here are some thoughts on how to perform a refactoring for the problem at hand: Every ETL application must perform at least these three steps:

  1. Extract data from the source
  2. Transform the data
  3. Load the data into the destination

(hence, the name ETL). As a start for refactoring, this give us at least three classes with distinct responsibilities: Extractor, Transformer and Loader. Now, instead of one big class, you have three with more targeted responsibilities. Nothing messy about that, and already a bit more testable.

Now zoom in on each of these three areas and see where you can split up responsibilities even more.

  • At the very least, you will need a good in-memory representation of each 'row' of source data. If the source is a relational database, you may want to use an ORM, but if not, such classes need to be modeled so that they correctly protect the invariants of each row (e.g. if a field is non-nullable, the class should guarantee this by throwing an exception if a null value is attempted). Such classes have a well-defined purpose and can be tested in isolation.
  • The same holds true for the destination: You need a good object model for that.
  • If there's advanced application-side filtering going on at the source, you could consider implementing these using the Specification design pattern. Those tend to be very testable as well.
  • The Transform step is where a lot of the action happens, but now that you have good object models of both source and destination, transformation can be performed by Mappers - again testable classes.

If you have many 'rows' of source and destination data, you can further split this up in Mappers for each logical 'row', etc.

It never needs to become messy, and the added benefit (besides automated testing) is that the object model is now way more flexible. If you ever need to write another ETL application involving one of the two sides, you alread have at least one third of the code written.

like image 77
Mark Seemann Avatar answered Oct 14 '22 14:10

Mark Seemann


What you should have initially are integration tests. These will test that the functions perform as expected and you could hit the actual database for this.

Once you have that savety net you could start refactoring the code to be more maintainable and introducing unit tests.

As mentioned by serbrech Workign Effectively with Legacy code will help you to no end, I would strongly advise reading it even for greenfield projects.

http://www.amazon.com/Working-Effectively-Legacy-Michael-Feathers/dp/0131177052

The main question I would ask is how often does the code change? If it is infrequent is it really worth the effort trying to introduce unit tests, if it is changed frequently then I would definatly consider cleaning it up a bit.

like image 32
Burt Avatar answered Oct 14 '22 16:10

Burt