Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

foreach function not working in Spark DataFrame

According to DataFrames API, definition is:

public void foreach(scala.Function1<Row,scala.runtime.BoxedUnit> f)

Applies a function f to all rows.

But when I am trying like

Dataframe df = sql.read()
    .format("com.databricks.spark.csv")
    .option("header","true")
    .load("file:///home/hadoop/Desktop/examples.csv");

df.foreach(x->
{
   System.out.println(x);
});

I am getting compile time error. any mistake?

like image 441
user6325753 Avatar asked Jan 06 '17 09:01

user6325753


2 Answers

You can cast it as Java RDD in order to use the lambda as you which:

df.toJavaRDD().foreach(x->
   System.out.println(x)
);
like image 74
Thomas Decaux Avatar answered Oct 31 '22 13:10

Thomas Decaux


First extend scala.runtime.AbstractFunction1 and implement Serializable like below

public abstract class SerializableFunction1<T,R> 
      extends AbstractFunction1<T, R> implements Serializable 
{
}

Now use this SerializableFunction1 class like below.

df.foreach(new SerializableFunction1<Row,BoxedUnit>(){
        @Override
        public BoxedUnit apply(Row row) {
            System.out.println(row.get(0));
            return BoxedUnit.UNIT;
        }
});
like image 33
abaghel Avatar answered Oct 31 '22 14:10

abaghel