Pyspark:How to calculate avg and count in a single groupBy? [duplicate]

Tags:

I would like to calculate avg and count in a single group by statement in Pyspark. How can I do that?

df = spark.createDataFrame([(1, 'John', 1.79, 28,'M', 'Doctor'),
                        (2, 'Steve', 1.78, 45,'M', None),
                        (3, 'Emma', 1.75, None, None, None),
                        (4, 'Ashley',1.6, 33,'F', 'Analyst'),
                        (5, 'Olivia', 1.8, 54,'F', 'Teacher'),
                        (6, 'Hannah', 1.82, None, 'F', None),
                        (7, 'William', 1.7, 42,'M', 'Engineer'),
                        (None,None,None,None,None,None),
                        (8,'Ethan',1.55,38,'M','Doctor'),
                        (9,'Hannah',1.65,None,'F','Doctor')]
                       , ['Id', 'Name', 'Height', 'Age', 'Gender', 'Profession'])

#This only shows avg but also I need count right next to it. How can I do that?

df.groupBy("Profession").agg({"Age":"avg"}).show()
df.show()

Thank you.

809

asked Aug 01 '18 11:08

melik

1 Answers

For the same column:

from pyspark.sql import functions as F
df.groupBy("Profession").agg(F.mean('Age'), F.count('Age')).show()

If you're able to use different columns:

df.groupBy("Profession").agg({'Age':'avg', 'Gender':'count'}).show()

128

answered Sep 27 '22 19:09

Pierre Gourseaud

Related questions
                            
                                MySQL #1140 - Mixing of GROUP columns
                            
                                How to count distinct values that all satisfy a condition in MySQL?
                            
                                Conditions in MySQL GROUP_CONCAT
                            
                                MySQL SUM of a COUNT with GROUP BY clause
                            
                                Incorrect result when select count distinct with order by and math function in MySQL 5.7.17
                            
                                Select grouping where all the elements meet the condition
                            
                                LEFT JOIN after GROUP BY?
                            
                                Oracle SQL GROUP BY "not a GROUP BY expression" help
                            
                                SQL select values sum same ID
                            
                                GROUP and SUM in Entity Framework
                            
                                Why is pandas.Series.std() different from numpy.std()?
                            
                                SELECT id HAVING maximum count of id
                            
                                Include missing months in Group By query
                            
                                SQL Sum Multiple rows into one
                            
                                How to group collection by columns with rails
                            
                                How to use groupby in pandas to calculate a percentage / proportion total based on a criteria in another column
                            
                                Why doesn't Postgres Group By NULL select counts?
                            
                                query with count subquery, inner join and group
                            
                                python groupby behaviour?
                            
                                Oracle / SQL - Count number of occurrences of values in a single column

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pyspark:How to calculate avg and count in a single groupBy? [duplicate]

Tags:

count

group-by

average

pyspark

melik

People also ask

1 Answers

Pierre Gourseaud

Recent Activity

Donate For Us