My intention is to do the equivalent of the basic sql
select shipgrp, shipstatus, count(*) cnt
from shipstatus group by shipgrp, shipstatus
The examples that I have seen for spark dataframes include rollups by other columns: e.g.
df.groupBy($"shipgrp", $"shipstatus").agg(sum($"quantity"))
But no other column is needed in my case shown above. So what is the syntax and/or method call combination here?
Update A reader has suggested this question were a duplicate of dataframe: how to groupBy/count then filter on count in Scala : but that one is about filtering by count
: there is no filtering here.
PySpark Groupby Count is used to get the number of records for each group. So to perform the count, first, you need to perform the groupBy() on DataFrame which groups the records based on single or multiple column values, and then do the count() to get the number of records for each group.
Use count() by Column Namegroupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values. It works with non-floating type data as well. The below example does the grouping on Courses column and calculates count how many times each value is present.
The most simple method for pandas groupby count is by using the in-built pandas method named size(). It returns a pandas series that possess the total number of row count for each group. The basic working of the size() method is the same as len() method and hence, it is not affected by NaN values in the dataset.
1 Answer. Suppose you have a df that includes columns “name” and “age”, and on these two columns you want to perform groupBY. Now, in order to get other columns also after doing a groupBy you can use join function. Now, data_joined will have all columns including the count values.
You can similarly do count("*")
in spark agg
function:
df.groupBy("shipgrp", "shipstatus").agg(count("*").as("cnt"))
val df = Seq(("a", 1), ("a", 1), ("b", 2), ("b", 3)).toDF("A", "B")
df.groupBy("A", "B").agg(count("*").as("cnt")).show
+---+---+---+
| A| B|cnt|
+---+---+---+
| b| 2| 1|
| a| 1| 2|
| b| 3| 1|
+---+---+---+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With