Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NullPointerException in Scala Spark, appears to be caused be collection type?

sessionIdList is of type :

scala> sessionIdList
res19: org.apache.spark.rdd.RDD[String] = MappedRDD[17] at distinct at <console>:30

When I try to run below code :

val x = sc.parallelize(List(1,2,3)) 
val cartesianComp = x.cartesian(x).map(x => (x))

val kDistanceNeighbourhood = sessionIdList.map(s => {
    cartesianComp.filter(v => v != null)
})

kDistanceNeighbourhood.take(1)

I receive exception :

14/05/21 16:20:46 ERROR Executor: Exception in task ID 80
java.lang.NullPointerException
        at org.apache.spark.rdd.RDD.filter(RDD.scala:261)
        at $line94.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:38)
        at $line94.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:36)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)

However if I use :

val l = sc.parallelize(List("1","2")) 
val kDistanceNeighbourhood = l.map(s => {    
    cartesianComp.filter(v => v != null)
})

kDistanceNeighbourhood.take(1)

Then no exception is displayed

The difference between the two code snippets is that in first snippet sessionIdList is of type :

res19: org.apache.spark.rdd.RDD[String] = MappedRDD[17] at distinct at <console>:30

and in second snippet "l" is of type

scala> l
res13: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[32] at parallelize at <console>:12

Why is this error occuring ?

Do I need to convert sessionIdList to ParallelCollectionRDD in order to fix this ?

like image 917
blue-sky Avatar asked May 21 '14 20:05

blue-sky


People also ask

What type of exception is NullPointerException?

NullPointerException is a runtime exception in Java that occurs when a variable is accessed which is not pointing to any object and refers to nothing or null. Since the NullPointerException is a runtime exception, it doesn't need to be caught and handled explicitly in application code.

What is NullPointerException in Scala?

NullPointerException is thrown when a reference variable is accessed (or de-referenced) and is not pointing to any object. This error can be resolved by using a try-catch block or an if-else condition to check if a reference variable is null before dereferencing it.

Can NullPointerException be caught?

Programmers typically catch NullPointerException under three circumstances: The program contains a null pointer dereference. Catching the resulting exception was easier than fixing the underlying problem. The program explicitly throws a NullPointerException to signal an error condition.


1 Answers

Spark doesn't support nesting of RDDs (see https://stackoverflow.com/a/14130534/590203 for another occurrence of the same problem), so you can't perform transformations or actions on RDDs inside of other RDD operations.

In the first case, you're seeing a NullPointerException thrown by the worker when it tries to access a SparkContext object that's only present on the driver and not the workers.

In the second case, my hunch is the job was run locally on the driver and worked purely by accident.

like image 88
Josh Rosen Avatar answered Oct 11 '22 12:10

Josh Rosen