Unit testing large blocks of code (mappings, translation, etc)

Tags:

We unit test most of our business logic, but are stuck on how best to test some of our large service tasks and import/export routines. For example, consider the export of payroll data from one system to a 3rd party system. To export the data in the format the company needs, we need to hit ~40 tables, which creates a nightmare situation for creating test data and mocking out dependencies.

For example, consider the following (a subset of ~3500 lines of export code):

public void ExportPaychecks()
{
   var pays = _pays.GetPaysForCurrentDate();
   foreach (PayObject pay in pays)
   {
      WriteHeaderRow(pay);
      if (pay.IsFirstCheck)
      {
         WriteDetailRowType1(pay);
      }
   }
}

private void WriteHeaderRow(PayObject pay)
{
   //do lots more stuff
}

private void WriteDetailRowType1(PayObject pay)
{
   //do lots more stuff
}

We only have the one public method in this particular export class - ExportPaychecks(). That's really the only action that makes any sense to someone calling this class ... everything else is private (~80 private functions). We could make them public for testing, but then we'd need to mock them to test each one separately (i.e. you can't test ExportPaychecks in a vacuum without mocking the WriteHeaderRow function. This is a huge pain too.

Since this is a single export, for a single vendor, moving logic into the Domain doesn't make sense. The logic has no domain significance outside of this particular class. As a test, we built out unit tests which had close to 100% code coverage ... but this required an insane amount of test data typed into stub/mock objects, plus over 7000 lines of code due to stubbing/mocking our many dependencies.

As a maker of HRIS software, we have hundreds of exports and imports. Do other companies REALLY unit test this type of thing? If so, are there any shortcuts to make it less painful? I'm half tempted to say "no unit testing the import/export routines" and just implement integration testing later.

Update - thanks for the answers all. One thing I'd love to see is an example, as I'm still not seeing how someone can turn something like a large file export into an easily testable block of code without turning the code into a mess.

967

asked Jan 16 '10 06:01

Andrew

2 Answers

This style of (attempted) unit testing where you try to cover an entire huge code base through a single public method always reminds me of surgeons, dentists or gynaecologists whe have perform complex operations through small openings. Possible, but not easy.

Encapsulation is an old concept in object-oriented design, but some people take it to such extremes that testability suffers. There's another OO principle called the Open/Closed Principle that fits much better with testability. Encapsulation is still valuable, but not at the expense of extensibility - in fact, testability is really just another word for the Open/Closed Principle.

I'm not saying that you should make your private methods public, but what I am saying is that you should consider refactoring your application into composable parts - many small classes that collaborate instead of one big Transaction Script. You may think it doesn't make much sense to do this for a solution to a single vendor, but right now you are suffering, and this is one way out.

What will often happen when you split up a single method in a complex API is that you also gain a lot of extra flexibility. What started out as a one-off project may turn into a reusable library.

Here are some thoughts on how to perform a refactoring for the problem at hand: Every ETL application must perform at least these three steps:

Extract data from the source
Transform the data
Load the data into the destination

(hence, the name ETL). As a start for refactoring, this give us at least three classes with distinct responsibilities: Extractor, Transformer and Loader. Now, instead of one big class, you have three with more targeted responsibilities. Nothing messy about that, and already a bit more testable.

Now zoom in on each of these three areas and see where you can split up responsibilities even more.

At the very least, you will need a good in-memory representation of each 'row' of source data. If the source is a relational database, you may want to use an ORM, but if not, such classes need to be modeled so that they correctly protect the invariants of each row (e.g. if a field is non-nullable, the class should guarantee this by throwing an exception if a null value is attempted). Such classes have a well-defined purpose and can be tested in isolation.
The same holds true for the destination: You need a good object model for that.
If there's advanced application-side filtering going on at the source, you could consider implementing these using the Specification design pattern. Those tend to be very testable as well.
The Transform step is where a lot of the action happens, but now that you have good object models of both source and destination, transformation can be performed by Mappers - again testable classes.

If you have many 'rows' of source and destination data, you can further split this up in Mappers for each logical 'row', etc.

It never needs to become messy, and the added benefit (besides automated testing) is that the object model is now way more flexible. If you ever need to write another ETL application involving one of the two sides, you alread have at least one third of the code written.

answered Oct 14 '22 14:10

Mark Seemann

What you should have initially are integration tests. These will test that the functions perform as expected and you could hit the actual database for this.

Once you have that savety net you could start refactoring the code to be more maintainable and introducing unit tests.

As mentioned by serbrech Workign Effectively with Legacy code will help you to no end, I would strongly advise reading it even for greenfield projects.

http://www.amazon.com/Working-Effectively-Legacy-Michael-Feathers/dp/0131177052

The main question I would ask is how often does the code change? If it is infrequent is it really worth the effort trying to introduce unit tests, if it is changed frequently then I would definatly consider cleaning it up a bit.

answered Oct 14 '22 16:10

Burt

Related questions
                            
                                explicitly refer to a class without a namespace in C#
                            
                                How can we show progress bar for upload with FtpWebRequest
                            
                                Removing drive (or network name) from path in C#
                            
                                Entity Framework 4.3 and Moq can't create DbContext mock
                            
                                Set up application resources from code
                            
                                check if date time string contains time
                            
                                Parse byte array to json with Json.Net
                            
                                C# lambda variable initialization [duplicate]
                            
                                Why is EPPlus telling me that I "Can't set color when patterntype is not set" when I have set PatternType?
                            
                                ServiceFilter and TypeFilter - what is the difference in injection those filters?
                            
                                What is the meaning of the square/diamond breakpoint in Visual Studio?
                            
                                Get current CPU, RAM and Disk drive usage in C#
                            
                                Could not load file or assembly. Invalid pointer (Exception from HRESULT: 0x80004003 (E_POINTER))
                            
                                Check if (partial) view exists from HtmlHelperMethod
                            
                                I thought await continued on the same thread as the caller, but it seems not to
                            
                                Parameterless constructors in structs for C# 6
                            
                                How would I pass additional parameters to MatchEvaluator
                            
                                LINQ: How to declare IEnumerable[AnonymousType]?
                            
                                What all should an expert C#/.Net/WPF developer know? [closed]
                            
                                Exclude property from getType().GetProperties()

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With