Reducing potentially empty RDD's

Tags:

So I'm running into an issue where a filter I'm using on an RDD can potentially create an empty RDD. I feel that doing a count() in order to test for emptiness would be very expensive, and was wondering if there is a more performant way to handle this situation.

Here is an example of what this issue might look like:

    val b:RDD[String] = sc.parallelize(Seq("a","ab","abc"))


    println(b.filter(a => !a.contains("a")).reduce(_+_))

would give the result

empty collection
java.lang.UnsupportedOperationException: empty collection
    at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$apply$36.apply(RDD.scala:1005)
    at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$apply$36.apply(RDD.scala:1005)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1005)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)
    at org.apache.spark.rdd.RDD.reduce(RDD.scala:985)

Does anyone have any suggestions for how I should go about addressing this edge case?

755

asked Dec 10 '15 20:12

Daniel Imberman

2 Answers

Consider .fold("")(_ + _) instead of .reduce(_ + _)

162

answered Sep 21 '22 17:09

Dima

how about

scala> val b = sc.parallelize(Seq("a","ab","abc"))
b: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[1] at     parallelize at <console>:24

scala> b.isEmpty
res1: Boolean = false

answered Sep 20 '22 17:09

Roberto Congiu

Related questions
                            
                                How can I get all object vals and subobject vals using reflection in Scala?
                            
                                When to use the ST monad in Scala?
                            
                                Do I need to use @tailrec in Scala?
                            
                                Why if I extend the App trait in Scala, I override the main method?
                            
                                How to make a nested toSet in scala in an idiomatic way?
                            
                                Token based authentication using Play 2 Framework
                            
                                Slick 2.0 Generic CRUD operations
                            
                                How can I use Shapeless to create a function abstracting over arity
                            
                                Infer HList type when building the list with a macro
                            
                                How to include file in production mode for Play framework
                            
                                Parsing a simple array with Spray-json
                            
                                Operation on Data Frame
                            
                                Using mapTo with futures in Akka/Scala
                            
                                How to compute the inverse of a RowMatrix in Apache Spark?
                            
                                Scala: Ignore Future return value, but chain them
                            
                                Scala's trait mix-in call chain
                            
                                Idiomatic Scala way of generating combinations lazily
                            
                                What does the arrow in an import statement do?
                            
                                Testing Play + Slick app
                            
                                Is it possible to use implicit parameters when defining routing directives?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Reducing potentially empty RDD's

Tags:

scala

apache-spark

Daniel Imberman

People also ask

2 Answers

Dima

Roberto Congiu

Recent Activity

Donate For Us