Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mock a Spark RDD in the unit tests

Is it possible to mock a RDD without using sparkContext?

I want to unit test the following utility function:

 def myUtilityFunction(data1: org.apache.spark.rdd.RDD[myClass1], data2: org.apache.spark.rdd.RDD[myClass2]): org.apache.spark.rdd.RDD[myClass1] = {...}

So I need to pass data1 and data2 to myUtilityFunction. How can I create a data1 from a mock org.apache.spark.rdd.RDD[myClass1], instead of create a real RDD from SparkContext? Thank you!

like image 743
Edamame Avatar asked Jun 19 '15 18:06

Edamame


1 Answers

I totally agree with @Holden on that!

Mocking RDDS is difficult; executing your unit tests in a local Spark context is preferred, as recommended in the programming guide.

I know this may not technically be a unit test, but it is hopefully close enough.

Unit Testing

Spark is friendly to unit testing with any popular unit test framework. Simply create a SparkContext in your test with the master URL set to local, run your operations, and then call SparkContext.stop() to tear it down. Make sure you stop the context within a finally block or the test framework’s tearDown method, as Spark does not support two contexts running concurrently in the same program.

But if you are really interested and you still want to try mocking RDDs, I'll suggest that you read the ImplicitSuite test code.

The only reason they are pseudo-mocking the RDD is to test if implict works well with the compiler, but they don't actually need a real RDD.

def mockRDD[T]: org.apache.spark.rdd.RDD[T] = null

And it's not even a real mock. It just creates a null object of type RDD[T]

like image 133
eliasah Avatar answered Sep 18 '22 15:09

eliasah