Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How I can aggregate employee based on their department and show average salary in each department using groupby pandas?

This is my code which has data in which I want to perform the task using pandas.DataFrame.groupby

import pandas as pd
data = {'employees_no':  [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13],
        'employees_name': ['Jugal Sompura', 'Maya Rajput', 'Chaitya Panchal', 'Sweta Rampariya', 'Prakshal Patel', 'Dhruv Panchal', 'Prachi Desai', 'Krunal Gosai', 'Hemil Soni', 'Gopal Pithadia', 'Jatin Shah', 'Raj Patel', 'Shreya Desai'],
        'department_name': ['HR', 'Administrative Assistant', 'Production', 'Accountant', 'Production', 'Engineer', 'Finance', 'Engineer', 'Quality Assurance', 'Engineer', 'Engineer', 'Customer Service', 'CEO'],
        'salary': [130000.0, 65000.0, 45000.0, 65000.0, 47000.0, 40000.0, 90000.0, 45000.0, 35000.0, 45000.0, 30000.0, 40000.0, 250000.0]
        }
df = pd.DataFrame (data, columns = ['employees_no', 'employees_name', 'department_name', 'salary'])
print(df)
---------------------------------------------------------------------
employees_no   employees_name           department_name      salary
0              1    Jugal Sompura                        HR  130000.0
1              2      Maya Rajput  Administrative Assistant   65000.0
2              3  Chaitya Panchal                Production   45000.0
3              4  Sweta Rampariya                Accountant   65000.0
4              5   Prakshal Patel                Production   47000.0
5              6    Dhruv Panchal                  Engineer   40000.0
6              7     Prachi Desai                   Finance   90000.0
7              8     Krunal Gosai                  Engineer   45000.0
8              9       Hemil Soni         Quality Assurance   35000.0
9             10   Gopal Pithadia                  Engineer   45000.0
10            11       Jatin Shah                  Engineer   30000.0
11            12        Raj Patel          Customer Service   40000.0
12            13     Shreya Desai                       CEO  250000.0
---------------------------------------------------------------------

I tried this and could only get this output.

print(df.groupby('department_name').agg({'salary':'mean'}))
---------------------------------------------------------------------
department_name             salary
Accountant                 65000.0
Administrative Assistant   65000.0
CEO                       250000.0
Customer Service           40000.0
Engineer                   40000.0
Finance                    90000.0
HR                        130000.0
Production                 46000.0
Quality Assurance          35000.0
---------------------------------------------------------------------

I'm not able to get output like this...

department_name          employees_name     avg_salary        
Accountant               Sweta Rampariya     65000.0
Administrative Assistant Maya Rajput         65000.0
CEO                      Shreya Desai       250000.0
Customer Service         Raj Patel           40000.0
Engineer                 Dhruv Panchal       40000.0
                         Gopal Pithadia      
                         Krunal Gosai        
                         Jatin Shah     
Finance                  Prachi Desai        90000.0
HR                       Jugal Sompura      130000.0
Production               Chaitya Panchal     46000.0
                         Prakshal Patel      
Quality Assurance        Hemil Soni          35000.0

Can you help me with this?

like image 896
Ravindra Mevada Avatar asked Sep 29 '20 04:09

Ravindra Mevada


People also ask

How to find average salaries from a table in MySQL?

To find just the average salaries from a table use this mySQL query. Show activity on this post. Show activity on this post. Looks like you would need to check for an overlap in the from_date and to_date, of the employee salary history ( salaries table) and the employees department history ( dept_emp table).

How to calculate the number of days a salary is effective?

To get the latter result, we could use an expression that calculates the number of days the salary was effective for in the department... We could multiple that by the effective salary, add all of those together, and then divide by the total number of days: Thanks for contributing an answer to Stack Overflow!

What is the average salary for a department?

As an example, if an employee worked in a department and the salary was 20,000 for one year, and then salary was 30,000 for four years... should the average salary be reported as 25,000 (the average of the two distinct values), or would it be 28,000 (total salary for all years, divided by the number of years.)


2 Answers

You need pandas.DataFrame.groupby.transform:

df["avg_salary"] = df.groupby("department_name")["salary"].transform("mean")
new_df = df.set_index(["department_name", "employees_name"]).sort_index()
print(new_df["avg_salary"])

Output:

department_name           employees_name 
Accountant                Sweta Rampariya     65000.0
Administrative Assistant  Maya Rajput         65000.0
CEO                       Shreya Desai       250000.0
Customer Service          Raj Patel           40000.0
Engineer                  Dhruv Panchal       40000.0
                          Gopal Pithadia      40000.0
                          Jatin Shah          40000.0
                          Krunal Gosai        40000.0
Finance                   Prachi Desai        90000.0
HR                        Jugal Sompura      130000.0
Production                Chaitya Panchal     46000.0
                          Prakshal Patel      46000.0
Quality Assurance         Hemil Soni          35000.0
Name: avg_salary, dtype: float64
like image 154
Chris Avatar answered Sep 29 '22 00:09

Chris


Extending what @Chris did and adding the part of remove average salary values if department_name is same.

Here's the full code:

import pandas as pd
data = {'employees_no':  [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13],
        'employees_name': ['Jugal Sompura', 'Maya Rajput', 'Chaitya Panchal', 'Sweta Rampariya', 'Prakshal Patel', 'Dhruv Panchal', 'Prachi Desai', 'Krunal Gosai', 'Hemil Soni', 'Gopal Pithadia', 'Jatin Shah', 'Raj Patel', 'Shreya Desai'],
        'department_name': ['HR', 'Administrative Assistant', 'Production', 'Accountant', 'Production', 'Engineer', 'Finance', 'Engineer', 'Quality Assurance', 'Engineer', 'Engineer', 'Customer Service', 'CEO'],
        'salary': [130000.0, 65000.0, 45000.0, 65000.0, 47000.0, 40000.0, 90000.0, 45000.0, 35000.0, 45000.0, 30000.0, 40000.0, 250000.0]
        }

df = pd.DataFrame (data)
df['avg_sal'] = df.groupby('department_name')['salary'].transform('mean')
new_df = df.set_index(["department_name", "employees_name"]).sort_index()
new_df.loc[new_df.index.get_level_values(0).duplicated()==True,'avg_sal']=''
print (new_df['avg_sal'])

This will print as follows:

department_name           employees_name 
Accountant                Sweta Rampariya     65000
Administrative Assistant  Maya Rajput         65000
CEO                       Shreya Desai       250000
Customer Service          Raj Patel           40000
Engineer                  Dhruv Panchal       40000
                          Gopal Pithadia           
                          Jatin Shah               
                          Krunal Gosai             
Finance                   Prachi Desai        90000
HR                        Jugal Sompura      130000
Production                Chaitya Panchal     46000
                          Prakshal Patel           
Quality Assurance         Hemil Soni          35000
like image 32
Joe Ferndz Avatar answered Sep 29 '22 01:09

Joe Ferndz