Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to load data from saved file with Spark

Spark provide method saveAsTextFile which can store RDD[T] into disk or hdfs easily.

T is an arbitrary serializable class.

I want to reverse the operation. I wonder whether there is a loadFromTextFile which can easily load a file into RDD[T]?

Let me make it clear:

class A extends Serializable {
...
}

val path:String = "hdfs..."
val d1:RDD[A] = create_A

d1.saveAsTextFile(path)

val d2:RDD[A] = a_load_function(path) // this is the function I want

//d2 should be the same as d1
like image 456
worldterminator Avatar asked May 15 '15 07:05

worldterminator


1 Answers

Try to use d1.saveAsObjectFile(path) to store and val d2 = sc.objectFile[A](path) to load.

I think you cannot saveAsTextFile and read it out as RDD[A] without transformation from RDD[String]

like image 156
yjshen Avatar answered Oct 25 '22 21:10

yjshen