This is my code which has data in which I want to perform the task using pandas.DataFrame.groupby
import pandas as pd
data = {'employees_no': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13],
'employees_name': ['Jugal Sompura', 'Maya Rajput', 'Chaitya Panchal', 'Sweta Rampariya', 'Prakshal Patel', 'Dhruv Panchal', 'Prachi Desai', 'Krunal Gosai', 'Hemil Soni', 'Gopal Pithadia', 'Jatin Shah', 'Raj Patel', 'Shreya Desai'],
'department_name': ['HR', 'Administrative Assistant', 'Production', 'Accountant', 'Production', 'Engineer', 'Finance', 'Engineer', 'Quality Assurance', 'Engineer', 'Engineer', 'Customer Service', 'CEO'],
'salary': [130000.0, 65000.0, 45000.0, 65000.0, 47000.0, 40000.0, 90000.0, 45000.0, 35000.0, 45000.0, 30000.0, 40000.0, 250000.0]
}
df = pd.DataFrame (data, columns = ['employees_no', 'employees_name', 'department_name', 'salary'])
print(df)
---------------------------------------------------------------------
employees_no employees_name department_name salary
0 1 Jugal Sompura HR 130000.0
1 2 Maya Rajput Administrative Assistant 65000.0
2 3 Chaitya Panchal Production 45000.0
3 4 Sweta Rampariya Accountant 65000.0
4 5 Prakshal Patel Production 47000.0
5 6 Dhruv Panchal Engineer 40000.0
6 7 Prachi Desai Finance 90000.0
7 8 Krunal Gosai Engineer 45000.0
8 9 Hemil Soni Quality Assurance 35000.0
9 10 Gopal Pithadia Engineer 45000.0
10 11 Jatin Shah Engineer 30000.0
11 12 Raj Patel Customer Service 40000.0
12 13 Shreya Desai CEO 250000.0
---------------------------------------------------------------------
I tried this and could only get this output.
print(df.groupby('department_name').agg({'salary':'mean'}))
---------------------------------------------------------------------
department_name salary
Accountant 65000.0
Administrative Assistant 65000.0
CEO 250000.0
Customer Service 40000.0
Engineer 40000.0
Finance 90000.0
HR 130000.0
Production 46000.0
Quality Assurance 35000.0
---------------------------------------------------------------------
I'm not able to get output like this...
department_name employees_name avg_salary
Accountant Sweta Rampariya 65000.0
Administrative Assistant Maya Rajput 65000.0
CEO Shreya Desai 250000.0
Customer Service Raj Patel 40000.0
Engineer Dhruv Panchal 40000.0
Gopal Pithadia
Krunal Gosai
Jatin Shah
Finance Prachi Desai 90000.0
HR Jugal Sompura 130000.0
Production Chaitya Panchal 46000.0
Prakshal Patel
Quality Assurance Hemil Soni 35000.0
Can you help me with this?
To find just the average salaries from a table use this mySQL query. Show activity on this post. Show activity on this post. Looks like you would need to check for an overlap in the from_date and to_date, of the employee salary history ( salaries table) and the employees department history ( dept_emp table).
To get the latter result, we could use an expression that calculates the number of days the salary was effective for in the department... We could multiple that by the effective salary, add all of those together, and then divide by the total number of days: Thanks for contributing an answer to Stack Overflow!
As an example, if an employee worked in a department and the salary was 20,000 for one year, and then salary was 30,000 for four years... should the average salary be reported as 25,000 (the average of the two distinct values), or would it be 28,000 (total salary for all years, divided by the number of years.)
You need pandas.DataFrame.groupby.transform
:
df["avg_salary"] = df.groupby("department_name")["salary"].transform("mean")
new_df = df.set_index(["department_name", "employees_name"]).sort_index()
print(new_df["avg_salary"])
Output:
department_name employees_name
Accountant Sweta Rampariya 65000.0
Administrative Assistant Maya Rajput 65000.0
CEO Shreya Desai 250000.0
Customer Service Raj Patel 40000.0
Engineer Dhruv Panchal 40000.0
Gopal Pithadia 40000.0
Jatin Shah 40000.0
Krunal Gosai 40000.0
Finance Prachi Desai 90000.0
HR Jugal Sompura 130000.0
Production Chaitya Panchal 46000.0
Prakshal Patel 46000.0
Quality Assurance Hemil Soni 35000.0
Name: avg_salary, dtype: float64
Extending what @Chris did and adding the part of remove average salary values if department_name is same.
Here's the full code:
import pandas as pd
data = {'employees_no': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13],
'employees_name': ['Jugal Sompura', 'Maya Rajput', 'Chaitya Panchal', 'Sweta Rampariya', 'Prakshal Patel', 'Dhruv Panchal', 'Prachi Desai', 'Krunal Gosai', 'Hemil Soni', 'Gopal Pithadia', 'Jatin Shah', 'Raj Patel', 'Shreya Desai'],
'department_name': ['HR', 'Administrative Assistant', 'Production', 'Accountant', 'Production', 'Engineer', 'Finance', 'Engineer', 'Quality Assurance', 'Engineer', 'Engineer', 'Customer Service', 'CEO'],
'salary': [130000.0, 65000.0, 45000.0, 65000.0, 47000.0, 40000.0, 90000.0, 45000.0, 35000.0, 45000.0, 30000.0, 40000.0, 250000.0]
}
df = pd.DataFrame (data)
df['avg_sal'] = df.groupby('department_name')['salary'].transform('mean')
new_df = df.set_index(["department_name", "employees_name"]).sort_index()
new_df.loc[new_df.index.get_level_values(0).duplicated()==True,'avg_sal']=''
print (new_df['avg_sal'])
This will print as follows:
department_name employees_name
Accountant Sweta Rampariya 65000
Administrative Assistant Maya Rajput 65000
CEO Shreya Desai 250000
Customer Service Raj Patel 40000
Engineer Dhruv Panchal 40000
Gopal Pithadia
Jatin Shah
Krunal Gosai
Finance Prachi Desai 90000
HR Jugal Sompura 130000
Production Chaitya Panchal 46000
Prakshal Patel
Quality Assurance Hemil Soni 35000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With