Based on the following <code>DataFrame</code>: <pre class="prettyprint"><code>val client = Seq((1,"A",10),(2,"A",5),(3,"B",56)).toDF("ID","Categ","Amnt") +---+-----+----+ | ID|Categ|Amnt| +---+-----+----+ | 1| A| 10| | 2| A| 5| | 3| B| 56| +---+-----+----+ </code></pre> I would like to to obtain the number of ID and the total amount by category: <pre class="prettyprint"><code>+-----+-----+---------+ |Categ|count|sum(Amnt)| +-----+-----+---------+ | B| 1| 56| | A| 2| 15| +-----+-----+---------+ </code></pre> Is it possible to do the count and the sum without having to do a join? <pre class="prettyprint"><code>client.groupBy("Categ").count .join(client.withColumnRenamed("Categ","cat") .groupBy("cat") .sum("Amnt"), 'Categ === 'cat) .drop("cat") </code></pre> Maybe something like that: <pre class="prettyprint"><code>client.createOrReplaceTempView("client") spark.sql("SELECT Categ count(Categ) sum(Amnt) FROM client GROUP BY Categ").show() </code></pre>

I'm giving different example than yours multiple group functions are possible like this. try it accordingly <pre class="prettyprint"><code> // In 1.3.x, in order for the grouping column "department" to show up, // it must be included explicitly as part of the agg function call. df.groupBy("department").agg($"department", max("age"), sum("expense")) // In 1.4+, grouping column "department" is included automatically. df.groupBy("department").agg(max("age"), sum("expense")) </code></pre> <hr> <pre class="prettyprint"><code>import org.apache.spark.sql.{DataFrame, SparkSession} import org.apache.spark.sql.functions._ val spark: SparkSession = SparkSession .builder.master("local") .appName("MyGroup") .getOrCreate() import spark.implicits._ val client: DataFrame = spark.sparkContext.parallelize( Seq((1,"A",10),(2,"A",5),(3,"B",56)) ).toDF("ID","Categ","Amnt") client.groupBy("Categ").agg(sum("Amnt"),count("ID")).show() </code></pre> <hr> <pre class="prettyprint"><code>+-----+---------+---------+ |Categ|sum(Amnt)|count(ID)| +-----+---------+---------+ | B| 56| 1| | A| 15| 2| +-----+---------+---------+ </code></pre>

How to calculate sum and count in a single groupBy?

val client = Seq((1,"A",10),(2,"A",5),(3,"B",56)).toDF("ID","Categ","Amnt") +---+-----+----+ | ID|Categ|Amnt| +---+-----+----+ |  1|    A|  10| |  2|    A|   5| |  3|    B|  56| +---+-----+----+

I would like to to obtain the number of ID and the total amount by category:

+-----+-----+---------+ |Categ|count|sum(Amnt)| +-----+-----+---------+ |    B|    1|       56| |    A|    2|       15| +-----+-----+---------+

Is it possible to do the count and the sum without having to do a join?

client.groupBy("Categ").count       .join(client.withColumnRenamed("Categ","cat")            .groupBy("cat")            .sum("Amnt"), 'Categ === 'cat)       .drop("cat")

Maybe something like that:

client.createOrReplaceTempView("client") spark.sql("SELECT Categ count(Categ) sum(Amnt) FROM client GROUP BY Categ").show()

222

asked Nov 06 '16 12:11

ulrich

1 Answers

I'm giving different example than yours

multiple group functions are possible like this. try it accordingly

  // In 1.3.x, in order for the grouping column "department" to show up, // it must be included explicitly as part of the agg function call. df.groupBy("department").agg($"department", max("age"), sum("expense"))  // In 1.4+, grouping column "department" is included automatically. df.groupBy("department").agg(max("age"), sum("expense"))

import org.apache.spark.sql.{DataFrame, SparkSession} import org.apache.spark.sql.functions._  val spark: SparkSession = SparkSession       .builder.master("local")       .appName("MyGroup")       .getOrCreate() import spark.implicits._     val client: DataFrame = spark.sparkContext.parallelize( Seq((1,"A",10),(2,"A",5),(3,"B",56)) ).toDF("ID","Categ","Amnt")  client.groupBy("Categ").agg(sum("Amnt"),count("ID")).show()

+-----+---------+---------+ |Categ|sum(Amnt)|count(ID)| +-----+---------+---------+ |    B|       56|        1| |    A|       15|        2| +-----+---------+---------+

159

answered Sep 17 '22 20:09

Ram Ghadiyaram

Related questions
                            
                                How to write "asInstanceOfOption" in Scala
                            
                                Why does `Array(0,1,2) == Array(0,1,2)` not return the expected result?
                            
                                Type theory: type kinds
                            
                                Annotating constructor parameters in Scala
                            
                                SBT install failure with aptitude on Ubuntu 14.04
                            
                                What is the proper way to code a read-while loop in Scala?
                            
                                What is Scala's counterpart of Discriminated Union in F#?
                            
                                Initializing a 2D (multi-dimensional) array in Scala
                            
                                Typesafe Swing events—"The outer reference in this type test cannot be checked at run time"
                            
                                Getting Value of Either
                            
                                How to use Environment Variables in build.sbt?
                            
                                How to build a multimap from a list of tuples in Scala?
                            
                                Why doesn't Scala have an IO Monad?
                            
                                In Scala, how would you declare static data inside a function?
                            
                                What is a DList?
                            
                                Parallel execution of tests
                            
                                Should I override the default ExecutionContext?
                            
                                How do you call a Scala singleton method from Java?
                            
                                Is there something wrong with an abstract value used in trait in scala?
                            
                                Condition in map function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to calculate sum and count in a single groupBy?

Tags:

scala

apache-spark

apache-spark-sql

ulrich

People also ask

1 Answers

Ram Ghadiyaram

Recent Activity

Donate For Us