I would like to calculate avg and count in a single group by statement in Pyspark. How can I do that?
df = spark.createDataFrame([(1, 'John', 1.79, 28,'M', 'Doctor'),
(2, 'Steve', 1.78, 45,'M', None),
(3, 'Emma', 1.75, None, None, None),
(4, 'Ashley',1.6, 33,'F', 'Analyst'),
(5, 'Olivia', 1.8, 54,'F', 'Teacher'),
(6, 'Hannah', 1.82, None, 'F', None),
(7, 'William', 1.7, 42,'M', 'Engineer'),
(None,None,None,None,None,None),
(8,'Ethan',1.55,38,'M','Doctor'),
(9,'Hannah',1.65,None,'F','Doctor')]
, ['Id', 'Name', 'Height', 'Age', 'Gender', 'Profession'])
#This only shows avg but also I need count right next to it. How can I do that?
df.groupBy("Profession").agg({"Age":"avg"}).show()
df.show()
Thank you.
PySpark Groupby Count is used to get the number of records for each group. So to perform the count, first, you need to perform the groupBy() on DataFrame which groups the records based on single or multiple column values, and then do the count() to get the number of records for each group.
In PySpark, you can use distinct(). count() of DataFrame or countDistinct() SQL function to get the count distinct. distinct() eliminates duplicate records(matching all columns of a Row) from DataFrame, count() returns the count of records on DataFrame.
Method -1 : Using select() method If we want to return the average value from multiple columns, we have to use the avg() method inside the select() method by specifying the column name separated by a comma. Where, df is the input PySpark DataFrame. column_name is the column to get the average value.
PySpark – sumDistinct() sumDistinct() in PySpark returns the distinct total (sum) value from a particular column in the DataFrame. It will return the sum by considering only unique values. It will not take duplicate values to form a sum.
For the same column:
from pyspark.sql import functions as F
df.groupBy("Profession").agg(F.mean('Age'), F.count('Age')).show()
If you're able to use different columns:
df.groupBy("Profession").agg({'Age':'avg', 'Gender':'count'}).show()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With