Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Methods of max() and sum() undefined in the Java Spark Dataframe API (1.4.1)

Putting sample code of DataFrame.groupBy() into my code, but it shown the methods of max() and sum() undefined.

df.groupBy("department").agg(max("age"), sum("expense"));

Which Java package should I import if I want to use max() and sum() method?

Is the Syntax correct of this sample code?

like image 894
Jingyu Zhang Avatar asked Sep 08 '15 06:09

Jingyu Zhang


1 Answers

The import didn't work for me. Eclipse IDE still showed the compilation error.

But the following method call worked

df.groupBy("Gender").agg(org.apache.spark.sql.functions.max(df.col("Id")), org.apache.spark.sql.functions.sum(df.col("Income")));

In case the aggregation involves only one field, we can also use the following syntax,

df.groupBy("Gender").max("Income");
like image 172
vishak Avatar answered Oct 03 '22 11:10

vishak