I'm trying to write a groupBy on Spark with JAVA. In SQL this would look like <pre class="prettyprint"><code>SELECT id, count(id) as count, max(date) maxdate FROM table GROUP BY id; </code></pre> But what is the Spark/JAVA style equivalent of this query? Let's say the variable <code>table</code> is a dataframe, to see the relation to the SQL query. I'm thinking something like: <pre class="prettyprint"><code>table = table.select(table.col("id"), (table.col("id").count()).as("count"), (table.col("date").max()).as("maxdate")).groupby("id") </code></pre> Which is obviously incorrect, since you can't use aggregate functions like <code>.count</code> or <code>.max</code> on columns, only dataframes. So how is this done in Spark JAVA? Thank you!

You could do this with <code>org.apache.spark.sql.functions</code>: <pre class="prettyprint"><code>import org.apache.spark.sql.functions; table.groupBy("id").agg( functions.count("id").as("count"), functions.max("date").as("maxdate") ).show(); </code></pre>

Spark (JAVA) - dataframe groupBy with multiple aggregations?

Tags:

java

apache-spark

I'm trying to write a groupBy on Spark with JAVA. In SQL this would look like

SELECT id, count(id) as count, max(date) maxdate
FROM table
GROUP BY id;

But what is the Spark/JAVA style equivalent of this query? Let's say the variable table is a dataframe, to see the relation to the SQL query. I'm thinking something like:

table = table.select(table.col("id"), (table.col("id").count()).as("count"), (table.col("date").max()).as("maxdate")).groupby("id")

Which is obviously incorrect, since you can't use aggregate functions like .count or .max on columns, only dataframes. So how is this done in Spark JAVA?

Thank you!

504

asked Jul 15 '16 12:07

lte__

1 Answers

You could do this with org.apache.spark.sql.functions:

import org.apache.spark.sql.functions;

table.groupBy("id").agg(
    functions.count("id").as("count"),
    functions.max("date").as("maxdate")
).show();

191

answered Oct 26 '22 20:10

Yuan JI

Related questions
                            
                                Connecting to a secured websocket
                            
                                Java CSV file parsing does not parse empty columns at end
                            
                                Offset LatLng by some amount of meters in Android
                            
                                Cannot find symbol 'Context', android.content.Context
                            
                                If I call a static method, does the constructor run
                            
                                ERROR jdbc.HiveConnection: Error opening session Hive
                            
                                (OpenCV) can't find Core.line in Android Studio
                            
                                Display a string that contains HTML in Thymeleaf template
                            
                                Android: Create a URL using Uri.Builder().build() with port numbers
                            
                                Equal Objects not being filtered by Stream.distinct()
                            
                                How to run multiple jobs in spring batch using annotations
                            
                                Dynamic grouping by specific attributes with Collection.stream
                            
                                Java stream - purpose of having both anyMatch and noneMatch operations?
                            
                                Unable to start activity ComponentInfo.....java.lang.IllegalStateException: Already attached
                            
                                IntelliJ 15 Works Slow and eats 1GB of RAM
                            
                                Exception thrown during try-with-resources declaration
                            
                                How can I merge .webm (Audio) file and a .mp4 (Video) file using java?
                            
                                Java Stream Api INNER JOIN Two Lists
                            
                                Is it possible to register all classes within a package as Spring beans
                            
                                Why do we need to compareAndSet when set is already atomic in Java?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With