NullPointerException in Scala Spark, appears to be caused be collection type?

Tags:

apache-spark

sessionIdList is of type :

scala> sessionIdList
res19: org.apache.spark.rdd.RDD[String] = MappedRDD[17] at distinct at <console>:30

When I try to run below code :

val x = sc.parallelize(List(1,2,3)) 
val cartesianComp = x.cartesian(x).map(x => (x))

val kDistanceNeighbourhood = sessionIdList.map(s => {
    cartesianComp.filter(v => v != null)
})

kDistanceNeighbourhood.take(1)

I receive exception :

14/05/21 16:20:46 ERROR Executor: Exception in task ID 80
java.lang.NullPointerException
        at org.apache.spark.rdd.RDD.filter(RDD.scala:261)
        at $line94.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:38)
        at $line94.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:36)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)

However if I use :

val l = sc.parallelize(List("1","2")) 
val kDistanceNeighbourhood = l.map(s => {    
    cartesianComp.filter(v => v != null)
})

kDistanceNeighbourhood.take(1)

Then no exception is displayed

The difference between the two code snippets is that in first snippet sessionIdList is of type :

res19: org.apache.spark.rdd.RDD[String] = MappedRDD[17] at distinct at <console>:30

and in second snippet "l" is of type

scala> l
res13: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[32] at parallelize at <console>:12

Why is this error occuring ?

Do I need to convert sessionIdList to ParallelCollectionRDD in order to fix this ?

917

asked May 21 '14 20:05

1 Answers

Spark doesn't support nesting of RDDs (see https://stackoverflow.com/a/14130534/590203 for another occurrence of the same problem), so you can't perform transformations or actions on RDDs inside of other RDD operations.

In the first case, you're seeing a NullPointerException thrown by the worker when it tries to access a SparkContext object that's only present on the driver and not the workers.

In the second case, my hunch is the job was run locally on the driver and worked purely by accident.

answered Oct 11 '22 12:10

Josh Rosen

Related questions
                            
                                Add two tuples containing simple elements in Scala
                            
                                How do the Scala based frameworks stack up for a complete Scala newbie - Lift, Play, Circumflex, etc [closed]
                            
                                Better to return None or throw an exception when fetching URL?
                            
                                IntelliJ IDEA Hotkey for comment does not work with Scala
                            
                                Remove a key from a JsValue in Scala
                            
                                Filter based on another RDD in Spark
                            
                                Exception in thread "main" java.lang.NoSuchMethodError: scala.Product.$init$(Lscala/Product;)
                            
                                Transforming Scala varargs into Java Object... varargs
                            
                                Scala operator oddity
                            
                                Extractor for a shapeless HList that mimics parser concatenation `~`
                            
                                Scala: return reference to a function
                            
                                Making sense of forall and exists output on empty list
                            
                                How do I convert an option tuple to a tuple of options in Scala?
                            
                                Play 2.3 implicit json conversion causes null pointer exception
                            
                                Aggregate function in spark-sql not found
                            
                                Passing parameters to a trait
                            
                                How to correctly generate SHA-256 checksum for a string in scala?
                            
                                Is there anything like Haskell's 'maybe' function built into Scala?
                            
                                How to get all request parameters in Play and Scala
                            
                                Sending the email to the following server failed : smtp.gmail.com:25

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

NullPointerException in Scala Spark, appears to be caused be collection type?

Tags:

scala

apache-spark

blue-sky

People also ask

1 Answers

Josh Rosen

Recent Activity

Donate For Us