Pandas matching algorithm

Question

Say, I have a dataframe like this one:

                     affinity
applicant_id job_id          
1            a              7
             b              7
             c              5
2            a              0
             b              4
             c              2
3            a              4
             b              8
             c              1

I need to match each applicant to a job so that (a) higher affinity is preferred; (b) no applicant is matched to more than one job; (c) no job is matched to more than one applicant. So in the example above I would like to get

                     affinity
applicant_id job_id          
3            b              8
1            a              7
2            c              2

The best I can think of is

tmp = candidates.sort_values(ascending=False).copy()
matches = []
while len(tmp):
    (applicant, job), affinity = next(tmp.iteritems())
    matches.append((applicant, job))
    tmp = tmp.loc[(tmp.index.get_level_values('applicant_id') != applicant)
                  & (tmp.index.get_level_values('job_id') != job)]
candidates.reindex(matches)

Can this be achieved in pandas without explicit iteration?

ALollz · Accepted Answer

This is the typical linear sum assignment problem.

We'll make matrix, filling the missing values with some absurdly high penalty, so that they shouldn't ever get matched. A job will only appear in this matrix if at least one worker has an affinity for it, so this will work.

Sample Data

from scipy import optimize
import pandas as pd

df = pd.DataFrame({'applicant_id': [1]*3 + [2]*3 + [3]*3 + [4],
                   'job_id': ['a', 'b', 'c']*3 + ['h'],
                   'affinity': [7,7,5,0,4,2,4,8,1,10]})

Code

df1 = df.pivot(index='applicant_id', columns='job_id', values='affinity').fillna(-10**8)
#job_id                  a            b            c            h
#applicant_id                                                    
#1                     7.0          7.0          5.0 -100000000.0
#2                     0.0          4.0          2.0 -100000000.0
#3                     4.0          8.0          1.0 -100000000.0
#4            -100000000.0 -100000000.0 -100000000.0         10.0

opt = optimize.linear_sum_assignment(df1.to_numpy()*-1)
pd.DataFrame(df1.lookup(df1.index[opt[0]], df1.columns[opt[1]]), 
             columns=['affinity'],
             index=pd.MultiIndex.from_arrays([df1.index[opt[0]], df1.columns[opt[1]]]))

Output:

                     affinity
applicant_id job_id          
1            a            7.0
2            c            2.0
3            b            8.0
4            h           10.0

With more jobs than people we assign everyone, but some jobs remain unfilled. With more people than jobs, some low-affinity people remain unassigned.

Pandas matching algorithm

Tags:

python

pandas

BlindDriver

1 Answers

Sample Data

Code

Output:

ALollz

Recent Activity

Donate For Us

Pandas matching algorithm

Tags:

python

pandas

BlindDriver

1 Answers

Sample Data

Code

Output:

ALollz

Related questions

Recent Activity

Donate For Us