Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

value join is not a member of org.apache.spark.rdd.RDD

I get this error:

value join is not a member of 
    org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[_0])))
        forSome { type _0 <: (String, Double) }]

The only suggestion I found is import org.apache.spark.SparkContext._ I am already doing that.

What am I doing wrong?

EDIT: changing the code to eliminate forSome (i.e., when the object has type org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[(String, Double)])))) solved the problem. Is this a bug in Spark?

like image 222
sds Avatar asked Mar 16 '23 15:03

sds


1 Answers

join is a member of org.apache.spark.rdd.PairRDDFunctions. So why does the implicit class not trigger?

scala> val s = Seq[(Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) }]()
scala> val r = sc.parallelize(s)
scala> r.join(r) // Gives your error message.
scala> val p = new org.apache.spark.rdd.PairRDDFunctions(r)
<console>:25: error: no type parameters for constructor PairRDDFunctions: (self: org.apache.spark.rdd.RDD[(K, V)])(implicit kt: scala.reflect.ClassTag[K], implicit vt: scala.reflect.ClassTag[V], implicit ord: Ordering[K])org.apache.spark.rdd.PairRDDFunctions[K,V] exist so that it can be applied to arguments (org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) }])
 --- because ---
argument expression's type is not compatible with formal parameter type;
 found   : org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) }]
 required: org.apache.spark.rdd.RDD[(?K, ?V)]
Note: (Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) } >: (?K, ?V), but class RDD is invariant in type T.
You may wish to define T as -T instead. (SLS 4.5)
       val p = new org.apache.spark.rdd.PairRDDFunctions(r)
               ^
<console>:25: error: type mismatch;
 found   : org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) }]
 required: org.apache.spark.rdd.RDD[(K, V)]
       val p = new org.apache.spark.rdd.PairRDDFunctions(r)

I'm sure that error message is clear to everyone else, but just for my own slow self let's try to make sense of it. PairRDDFunctions has two type parameters, K and V. Your forSome is for the whole pair, so it cannot be split into separate K and V types. There are no K and V that RDD[(K, V)] would equal your RDD type.

However, you could have the forSome only apply to the key, instead of the whole pair. Join works now, because this type can be separated into K and V.

scala> val s2 = Seq[(Long, (Int, (Long, String, Array[_0])) forSome { type _0 <: (String, Double) })]()
scala> val r2 = sc.parallelize(2s)
scala> r2.join(r2)
res0: org.apache.spark.rdd.RDD[(Long, ((Int, (Long, String, Array[_0])) forSome { type _0 <: (String, Double) }, (Int, (Long, String, Array[_0])) forSome { type _0 <: (String, Double) }))] = MapPartitionsRDD[5] at join at <console>:26
like image 187
Daniel Darabos Avatar answered Mar 19 '23 11:03

Daniel Darabos