Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Treat Spark RDD like plain Seq

I have a CLI application for transforming JSONs. Most of it's code is mapping, flatMapping and traversing with for Lists of JValues. Now I want to port this application to Spark, but seems I need to rewrite all functions 1:1, but write RDD[JValue] instead of List[JValue].

Is there any way (like type class) for function to accept both Lists and RDDs.

like image 479
chuwy Avatar asked Oct 30 '22 20:10

chuwy


1 Answers

If you want to share your code for processing local & abstract code you can move your lambdas/anaonymous functions that you pass in to map/flatMap into named functions and re-use them.

If you want to re-use your logic for how to order the maps/flatMaps/etc, you could also create an implicit conversions between both RDD and Seq to a custom trait which has only the shared functions but implicit conversions can become quite confusing and I don't really think this is a good idea (but you could do it if you disagree with me :)).

like image 194
Holden Avatar answered Nov 15 '22 07:11

Holden