Say, I have a dataframe like this one:
affinity
applicant_id job_id
1 a 7
b 7
c 5
2 a 0
b 4
c 2
3 a 4
b 8
c 1
I need to match each applicant to a job so that (a) higher affinity is preferred; (b) no applicant is matched to more than one job; (c) no job is matched to more than one applicant. So in the example above I would like to get
affinity
applicant_id job_id
3 b 8
1 a 7
2 c 2
The best I can think of is
tmp = candidates.sort_values(ascending=False).copy()
matches = []
while len(tmp):
(applicant, job), affinity = next(tmp.iteritems())
matches.append((applicant, job))
tmp = tmp.loc[(tmp.index.get_level_values('applicant_id') != applicant)
& (tmp.index.get_level_values('job_id') != job)]
candidates.reindex(matches)
Can this be achieved in pandas without explicit iteration?
This is the typical linear sum assignment problem.
We'll make matrix, filling the missing values with some absurdly high penalty, so that they shouldn't ever get matched. A job will only appear in this matrix if at least one worker has an affinity for it, so this will work.
from scipy import optimize
import pandas as pd
df = pd.DataFrame({'applicant_id': [1]*3 + [2]*3 + [3]*3 + [4],
'job_id': ['a', 'b', 'c']*3 + ['h'],
'affinity': [7,7,5,0,4,2,4,8,1,10]})
df1 = df.pivot(index='applicant_id', columns='job_id', values='affinity').fillna(-10**8)
#job_id a b c h
#applicant_id
#1 7.0 7.0 5.0 -100000000.0
#2 0.0 4.0 2.0 -100000000.0
#3 4.0 8.0 1.0 -100000000.0
#4 -100000000.0 -100000000.0 -100000000.0 10.0
opt = optimize.linear_sum_assignment(df1.to_numpy()*-1)
pd.DataFrame(df1.lookup(df1.index[opt[0]], df1.columns[opt[1]]),
columns=['affinity'],
index=pd.MultiIndex.from_arrays([df1.index[opt[0]], df1.columns[opt[1]]]))
affinity
applicant_id job_id
1 a 7.0
2 c 2.0
3 b 8.0
4 h 10.0
With more jobs than people we assign everyone, but some jobs remain unfilled. With more people than jobs, some low-affinity people remain unassigned.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With