Scalaz Type Classes for Apache Spark RDDs

Tags:

The goal is to implement different type classes (like Semigroup, Monad, Functor, etc.) provided by Scalaz for Spark's RDD (distributed collection). Unfortunately, I cannot make any of the type classes that take higher kinded types (like Monad, Functor, etc.) to work well with RDDs.

RDDs are defined (simplified) as:

abstract class RDD[T: ClassTag](){
   def map[U: ClassTag](f: T => U): RDD[U] = {...}
}

Complete code for RDDs can be found here.

Here is one example that works fine:

import scalaz._, Scalaz._
import org.apache.spark.rdd.RDD

implicit def semigroupRDD[A] = new Semigroup[RDD[A]] {
   def append(x:RDD[A], y: => RDD[A]) = x.union(y)
}

Here is one example that doesn't work:

implicit def functorRDD =  new Functor[RDD] {
   override def map[A, B](fa: RDD[A])(f: A => B): RDD[B] = {
      fa.map(f)
   }
}

This fails with:

error: No ClassTag available for B fa.map(f)

The error is pretty clear. The map implemented in RDD expects a ClassTag (see above). The ScalaZ functor/monads etc., do not have a ClassTag. Is it even possible to make this work without modifying Scalaz and/or Spark?

414

asked Apr 17 '16 04:04

marios

1 Answers

Short answer: no

For type classes like Functor, the restriction is that for any A and B, unconstrained, given A => B you have a function lifted RDD[A] => RDD[B]. In Spark you cannot pick arbitrary A and B, since you need a ClassTag for B, as you saw.

For other type classes like Semigroup where the type doesn't change during the operation and therefore does not need a ClassTag, it works.

answered Nov 06 '22 00:11

adelbertc

Related questions
                            
                                Why if I extend the App trait in Scala, I override the main method?
                            
                                How to make a nested toSet in scala in an idiomatic way?
                            
                                Token based authentication using Play 2 Framework
                            
                                Slick 2.0 Generic CRUD operations
                            
                                How can I use Shapeless to create a function abstracting over arity
                            
                                Infer HList type when building the list with a macro
                            
                                How to include file in production mode for Play framework
                            
                                Parsing a simple array with Spray-json
                            
                                Operation on Data Frame
                            
                                Using mapTo with futures in Akka/Scala
                            
                                How to compute the inverse of a RowMatrix in Apache Spark?
                            
                                Scala: Ignore Future return value, but chain them
                            
                                Scala's trait mix-in call chain
                            
                                Idiomatic Scala way of generating combinations lazily
                            
                                What does the arrow in an import statement do?
                            
                                Testing Play + Slick app
                            
                                Is it possible to use implicit parameters when defining routing directives?
                            
                                Reducing potentially empty RDD's
                            
                                I find myself reversing accumulators at the end of most functions; how can I stop?
                            
                                Read file on remote machine in Apache Spark using ftp

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Scalaz Type Classes for Apache Spark RDDs

Tags:

functional-programming

scala

scalaz

apache-spark

rdd

marios

People also ask

1 Answers

adelbertc

Recent Activity

Donate For Us