I have a dataframe with 4 columns. 3 of these columns contain string values (people's names) and the 4th one has an int value (salary for a job done).
The string values are not unique either, the same string will show up several times in each column, but never more than once per row.
data = {
'worker1': ['Sam', 'Jack', 'Matt', 'Paul', 'Tim'],
'worker2': ['Alex', 'Amy', 'Sam', 'Alice', 'Amanda'],
'worker3': ['Alice', 'Aaron', 'Tony', 'Jack', 'Sam'],
'earnings': [4564552, 4573547, 3567567, 6357653, 7648576]}
df = pd.DataFrame(data, columns = ['worker1', 'worker2', 'worker3', 'earnings'])
print(df)
worker1 worker2 worker3 earnings
'Sam' 'Alex' 'Alice' 4564552
'Jack' 'Amy' 'Aaron' 4573547
'Matt' 'Sam' 'Tony' 3567567
'Paul' 'Alice' 'Jack' 6357653
'Tim' 'Amanda' 'Sam' 7648576
So what I need is to sum all the earnings associated to the specific name, regardless if it shows on column1, 2 or 3. I'm not sure if I should use a groupby function for this, build a dictionary or go another route.
This would be what I'm trying to accomplish:
workers total_earnings
Sam 16080695
Alex 4564552
Alice 10922205
Jack 10931200
Amy 4573547
Aaron 4573547
Matt 3567567
Tony 3567567
Paul 6357653
Tim 7648576
Amanda 7648576
I'm quite new to pandas so I'm at a place where I'm not familiar with which functions I can use for something like this. I've mostly tried to use a groupby function but that was a disaster.
Any help would be highly appreciated.
A bit lengthy, but does what you want:
>>> df1 = pd.concat([df.groupby('worker1').sum(), df.groupby('worker2').sum(), df.groupby('worker3').sum()])
>>> df1.groupby(df1.index).sum()
earnings
Aaron 4573547
Alex 4564552
Alice 10922205
Amanda 7648576
Amy 4573547
Jack 10931200
Matt 3567567
Paul 6357653
Sam 15780695
Tim 7648576
Tony 3567567
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With