I'm retrofitting a bunch of existing Hadoop unit tests that were previously run in an in-memory cluster (Using MiniMRCluster) into MRUnit. The existing test cases essentially provide input to the Map phase and then test the output from the Reduce phase. I have three questions, and the best answer to any of them will qualify: 1) What do I lose, architecturally, by unit testing with MRUnit instead of an in-memory cluster? 2) Is it worthwhile to break the existing test cases up into Map-only tests and Reduce-only tests or not? Are there any cases where I would have to break them up? 3) Are there any testing scenarios that MRUnit is unable to cover?

The retrofitting process has taught me some potential answers, which I'm going to post here. I would still prefer to hear what others have to say, though, so I won't accept this answer. 1) I lose at least two things. First, the MR plumbing is mocked. So, there is a chance that some of the 'mocking' hides a problem that may exist in the MR job. Second, an MR job consists of the input from the file system and the output to the file system, in addition to partitioning and ordering between the map and reduce phase. MRUnit doesn't completely handle these aspects of Hadoop, so if an MR job depends on these functions, they can't be tested. It is still possible to rewrite the tests to test just the Map/Reduce parts, though. 2) For the most part, it isn't worthwhile to break up existing tests. If an existing test depends on a partitioner, for example, then it may make sense to break up the test so that the Map and Reduce can be tested without the partitioner involved. In general, though, it isn't worth doing "just to do it." 3) Yes -- Partitioners for one. Output formats for another. This may not be quite as big a deal for some people, but many of our existing jobs rely on these two features and since the unit tests are against the final output from the the output format, I'm having to rewrite quite a few tests to get them to work. [edit] just read a blog post from Cloudera that goes to the answer as well: http://www.cloudera.com/blog/2009/07/debugging-mapreduce-programs-with-mrunit/

Hadoop testing using MRUnit

Tags:

unit-testing

hadoop

I'm retrofitting a bunch of existing Hadoop unit tests that were previously run in an in-memory cluster (Using MiniMRCluster) into MRUnit. The existing test cases essentially provide input to the Map phase and then test the output from the Reduce phase.

I have three questions, and the best answer to any of them will qualify:

1) What do I lose, architecturally, by unit testing with MRUnit instead of an in-memory cluster?

2) Is it worthwhile to break the existing test cases up into Map-only tests and Reduce-only tests or not? Are there any cases where I would have to break them up?

3) Are there any testing scenarios that MRUnit is unable to cover?

795

asked May 25 '11 01:05

Paul W

1 Answers

The retrofitting process has taught me some potential answers, which I'm going to post here. I would still prefer to hear what others have to say, though, so I won't accept this answer.

1) I lose at least two things. First, the MR plumbing is mocked. So, there is a chance that some of the 'mocking' hides a problem that may exist in the MR job. Second, an MR job consists of the input from the file system and the output to the file system, in addition to partitioning and ordering between the map and reduce phase. MRUnit doesn't completely handle these aspects of Hadoop, so if an MR job depends on these functions, they can't be tested. It is still possible to rewrite the tests to test just the Map/Reduce parts, though.

2) For the most part, it isn't worthwhile to break up existing tests. If an existing test depends on a partitioner, for example, then it may make sense to break up the test so that the Map and Reduce can be tested without the partitioner involved. In general, though, it isn't worth doing "just to do it."

3) Yes -- Partitioners for one. Output formats for another. This may not be quite as big a deal for some people, but many of our existing jobs rely on these two features and since the unit tests are against the final output from the the output format, I'm having to rewrite quite a few tests to get them to work.

[edit]

just read a blog post from Cloudera that goes to the answer as well:

http://www.cloudera.com/blog/2009/07/debugging-mapreduce-programs-with-mrunit/

146

answered Oct 27 '22 09:10

Paul W

Related questions
                            
                                Android Unit Testing of Categories
                            
                                Testing a Laravel package
                            
                                How to structure an asp.net 5 dnx project with unit tests in Visual Studio Code?
                            
                                How to mock An Interface Java PowerMockito
                            
                                How to see code coverage in Clion
                            
                                How to unit test Promise catch() method behavior with async/await in Jest?
                            
                                AssemblyInitialize method doesnt run before tests
                            
                                Mocking & monitoring Keyboard events with jest in react native
                            
                                What's the difference between `verifySequence` and `verifyOrder` in MockK?
                            
                                Azure DevOps: How to merge two code coverage reports for different tests (.net core, angular)
                            
                                In Jest, how can I unit test a method that subscribes to an observable
                            
                                How do I annotate the type of an empty slice in Rust? [duplicate]
                            
                                How do you structure your NUnit tests on a large project?
                            
                                Testing objects with dependencies in PHPUnit
                            
                                How can I execute silverlight unit tests from the command line
                            
                                unit tests for screen-scraping?
                            
                                Where to place unit test project
                            
                                Guidelines for using Assert versus Verify
                            
                                How to test Models in Django with Foreign Keys
                            
                                Why to hold tests in a separate project rather then folder?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Hadoop testing using MRUnit

Tags:

unit-testing

hadoop

Paul W

People also ask

1 Answers

Paul W

Recent Activity

Donate For Us