Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to construct ClassTag for Spark SQL DataFrame Mapping?

Spark SQL 1.2.0 queries return JavaRDD. Spark SQL 1.3.0 queries return DataFrame. Converting DataFrame to JavaRDD by DataFrame.toJavaRDD seems to take quite bit of time. I tried to use the DataFrame.map() and got a puzzling problem:

DataFrame df = sqlSC.sql(sql);
RDD<String> rdd = df.map(new AbstractFunction1<Row, String> (){

        @Override
        public String apply(Row t1) {
            return t1.getString(0);
        }


    }, ?); 

"?" should be scala.reflect.ClassTag. I used ClassManifestFactory.fromClass(String.class) and it didn't work. What should I put at "?".

By the way, the example given by http://spark.apache.org/docs/1.3.0/sql-programming-guide.html's Interoperating with RDDs section Java Code is not corrected: It used "map(new Function() {". The "Function" is not acceptable there. It should be "Function1".

like image 999
Paul Z Wu Avatar asked Mar 16 '15 02:03

Paul Z Wu


2 Answers

Try this:

RDD<String> rdd = df.map(new AbstractFunction1<Row, String> (){

    @Override
    public String apply(Row t1) {
        return t1.getString(0);
    }


}, scala.reflect.ClassManifestFactory.fromClass(String.class));
like image 184
Sivanand Sivaram Avatar answered Sep 21 '22 12:09

Sivanand Sivaram


try this:(worked for me)

RDD<String> rdd = df.toJavaRDD().map(new Function<Row, String> (){

@Override
public String call(Row t1) {
    return t1.getString(0);
}
});
like image 25
Yassine Jouini Avatar answered Sep 20 '22 12:09

Yassine Jouini