Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to update Row/column value in a Apache Spark DataFrame?

I have an ordered Spark DataFrameand I would like to change a few rows while iterating it using the following code but it seems there is not any way to update Row object.

orderedDataFrame.foreach(new Function1<Row,BoxedUnit>(){

@Override
public BoxedUnit apply(Row v1) {
// How do I change Row here? 
// I want to change column no 2 using v1.get(2)
// also what is BoxedUnit, and how do I use it
return null;
}
});

Also above code is giving compilation error saying:

myclassname is not abstract and it does not override abstract method apply$mcVj$sp(long) in scala Function 1

I am new to Spark. I am using 1.4.0 release.

like image 486
Umesh K Avatar asked Jul 15 '15 18:07

Umesh K


1 Answers

Try This:

 final DataFrame withoutCurrency = sqlContext.createDataFrame(somedf.javaRDD().map(row -> {
            return RowFactory.create(row.get(0), row.get(1), someMethod(row.get(2)));
        }), somedf.schema());
like image 171
Piotr Sobolewski Avatar answered Oct 15 '22 02:10

Piotr Sobolewski