I have something like this:
fromJobtitle toJobtitle size
0 CEO CEO 65
1 CEO Vice President 23
2 CEO Employee 56
3 Vice President CEO 112
4 Employee CEO 20
I would like to count number of co-occurences so that it combines the double occurences (showing only how many elements there are between the 2)
An example Output:
0 CEO Vice President 135
1 CEO Employee 76
2 CEO CEO 65
import pandas as pd
df = pd.DataFrame({
'fromJobtitle': ['CEO', 'CEO', 'CEO', 'Vice President', 'Employee'],
'toJobtitle': ['CEO', 'Vice President', 'Employee', 'CEO', 'CEO'],
'size': [65, 23, 56, 112, 20]
})
df['combination'] = df.apply(lambda row: tuple(sorted([
row['fromJobtitle'],
row['toJobtitle']
])), axis=1)
then:
df = df.groupby('combination').sum().reset_index()
results:
combination size
0 (CEO, CEO) 65
1 (CEO, Employee) 76
2 (CEO, Vice President) 135
finally:
df['from'] = df.apply(lambda row: row['combination'][0], axis=1)
df['to'] = df.apply(lambda row: row['combination'][1], axis=1)
df = df.drop('combination', axis=1)
df.head()
result:
size from to
0 65 CEO CEO
1 76 CEO Employee
2 135 CEO Vice President
Try:
df.groupby(lambda x: tuple(sorted(df.loc[x, ['fromJobTitle', 'toJobTitle']]))).sum()
Here is the result:
size
(CEO, CEO) 65
(CEO, Employee) 76
(CEO, Vice President) 135
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With