foreach function not working in Spark DataFrame

Question

According to DataFrames API, definition is:

public void foreach(scala.Function1<Row,scala.runtime.BoxedUnit> f)

Applies a function f to all rows.

But when I am trying like

Dataframe df = sql.read()
    .format("com.databricks.spark.csv")
    .option("header","true")
    .load("file:///home/hadoop/Desktop/examples.csv");

df.foreach(x->
{
   System.out.println(x);
});

I am getting compile time error. any mistake?

Thomas Decaux · Accepted Answer

You can cast it as Java RDD in order to use the lambda as you which:

df.toJavaRDD().foreach(x->
   System.out.println(x)
);

abaghel · Answer

First extend scala.runtime.AbstractFunction1 and implement Serializable like below

public abstract class SerializableFunction1<T,R> 
      extends AbstractFunction1<T, R> implements Serializable 
{
}

Now use this SerializableFunction1 class like below.

df.foreach(new SerializableFunction1<Row,BoxedUnit>(){
        @Override
        public BoxedUnit apply(Row row) {
            System.out.println(row.get(0));
            return BoxedUnit.UNIT;
        }
});

foreach function not working in Spark DataFrame

Tags:

java

dataframe

apache-spark

hadoop

spark-dataframe

user6325753

2 Answers

Thomas Decaux

abaghel

Recent Activity

Donate For Us

foreach function not working in Spark DataFrame

Tags:

java

dataframe

apache-spark

hadoop

spark-dataframe

user6325753

2 Answers

Thomas Decaux

abaghel

Related questions

Recent Activity

Donate For Us