Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

convert RDD to Dataset in Java Spark

I have an RDD, i need to convert it into a Dataset, i tried:

Dataset<Person> personDS =  sqlContext.createDataset(personRDD, Encoders.bean(Person.class));

the above line throws the error,

cannot resolve method createDataset(org.apache.spark.api.java.JavaRDD Main.Person, org.apache.spark.sql.Encoder T)

however, i can convert to Dataset after converting to Dataframe. the below code works:

Dataset<Row> personDF = sqlContext.createDataFrame(personRDD, Person.class);
Dataset<Person> personDS = personDF.as(Encoders.bean(Person.class));
like image 947
vdep Avatar asked Jul 26 '17 12:07

vdep


2 Answers

.createDataset() accepts RDD<T> not JavaRDD<T>. JavaRDD is a wrapper around RDD inorder to make calls from java code easier. It contains RDD internally and can be accessed using .rdd(). The following can create a Dataset:

Dataset<Person> personDS =  sqlContext.createDataset(personRDD.rdd(), Encoders.bean(Person.class));
like image 107
vdep Avatar answered Oct 17 '22 08:10

vdep


on your rdd use .toDS() you will get a dataset.

Let me know if it helps. Cheers.

like image 34
Chitral Verma Avatar answered Oct 17 '22 09:10

Chitral Verma