Is it possible to mock a RDD without using sparkContext?
I want to unit test the following utility function:
def myUtilityFunction(data1: org.apache.spark.rdd.RDD[myClass1], data2: org.apache.spark.rdd.RDD[myClass2]): org.apache.spark.rdd.RDD[myClass1] = {...}
So I need to pass data1 and data2 to myUtilityFunction. How can I create a data1 from a mock org.apache.spark.rdd.RDD[myClass1], instead of create a real RDD from SparkContext? Thank you!
I totally agree with @Holden on that!
Mocking RDDS is difficult; executing your unit tests in a local Spark context is preferred, as recommended in the programming guide.
I know this may not technically be a unit test, but it is hopefully close enough.
Unit Testing
Spark is friendly to unit testing with any popular unit test framework. Simply create a SparkContext in your test with the master URL set to local, run your operations, and then call SparkContext.stop() to tear it down. Make sure you stop the context within a finally block or the test framework’s tearDown method, as Spark does not support two contexts running concurrently in the same program.
But if you are really interested and you still want to try mocking RDDs, I'll suggest that you read the ImplicitSuite test code.
The only reason they are pseudo-mocking the RDD is to test if implict
works well with the compiler, but they don't actually need a real RDD.
def mockRDD[T]: org.apache.spark.rdd.RDD[T] = null
And it's not even a real mock. It just creates a null object of type RDD[T]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With