Spark SQL 1.2.0 queries return JavaRDD. Spark SQL 1.3.0 queries return DataFrame. Converting DataFrame to JavaRDD by DataFrame.toJavaRDD seems to take quite bit of time. I tried to use the DataFrame.map() and got a puzzling problem:
DataFrame df = sqlSC.sql(sql);
RDD<String> rdd = df.map(new AbstractFunction1<Row, String> (){
@Override
public String apply(Row t1) {
return t1.getString(0);
}
}, ?);
"?" should be scala.reflect.ClassTag. I used ClassManifestFactory.fromClass(String.class) and it didn't work. What should I put at "?".
By the way, the example given by http://spark.apache.org/docs/1.3.0/sql-programming-guide.html's Interoperating with RDDs section Java Code is not corrected: It used "map(new Function() {". The "Function" is not acceptable there. It should be "Function1".
Try this:
RDD<String> rdd = df.map(new AbstractFunction1<Row, String> (){
@Override
public String apply(Row t1) {
return t1.getString(0);
}
}, scala.reflect.ClassManifestFactory.fromClass(String.class));
try this:(worked for me)
RDD<String> rdd = df.toJavaRDD().map(new Function<Row, String> (){
@Override
public String call(Row t1) {
return t1.getString(0);
}
});
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With