Problem statement: There are 5 projects and 15 employees and the numbers against each column shows the interest of each employee in a given project. Each project can have a maximum of 3 employees. Scores are from 1-5 1 being the highest preference and 5 being the lowest preference. I have to divide the employees among the projects in such a way that least number of people get dissatisfied or minimum score. Note that my algorithm creates all possible combinations and then sorts those combinations with sum in ascending order and picks up top 5 combinations with distinct employees.
But here is the problem, so, for example, my sortedsum matrix is [1,1,1,4,9,9,...] now this algorithmically is correct, but the thing is if I pick top 5 of them my total sum will be 16. But there could be a possibility where instead of taking [1,1,1,4,9] if I take [2,1,1,4] as the first four then the fifth project team sum goes to 3 and in that way the minimum will change and here's the point where my algorithm fails.
I have a 3nXn matrix, for this example, I will take it as a 15x5: So the matrix looks like this (https://i.stack.imgur.com/omcq7.png):
df = pd.read_csv(io.StringIO("""
employee proj_A proj_B proj_C proj_D proj_E
A1 1 5 3 4 2
B1 5 4 1 2 3
C1 2 3 4 1 5
A2 4 2 1 3 5
B2 4 5 3 2 1
C2 3 1 2 5 4
A3 1 2 4 3 5
B3 2 3 1 5 4
C3 5 3 4 1 2
A4 4 5 3 2 1
B4 5 3 4 2 1
C4 1 2 3 4 5
A5 1 3 2 5 4
B5 2 1 3 5 4
C5 2 1 4 5 4
"""), sep=r"\s+")
[formatted to make it easy to paste into the shell]
The problem I want to solve is to pick three distinct elements in each column, distinct meaning from distinct rows in such a way that their sum taken together for all 5 column remains the least.
For example, here if I pick A1,B1,C1 for A and then A2,B2,C2 for B and so on then the sum which is 1+5+2=8 for A and 2+5+1=8 for B and so on when added together i.e. 8+8+... should be the minimum sum of all possible combinations. Note that if A1, B1 and C1 are assigned to A they can't switch to B or any other next column.
What I tried was created all possible combinations starting from A1, B1, C1 to A5, B5 and C5 and computed their sums and sorted it in increasing order and picked first five which had distinct elements as shown below:
Limitations with my code: 1. It takes so much time for the matrix I am optimizing (which is a 30x10 matrix) as the combinations are way too much. 2. It will ignore any scenario where by compromising the scores for the initial elements to a littlebit higher, we may get the mid-scores which can be reduced a lot.
import pandas as pd
data=pd.read_csv("csvfile.csv")
teamsize=3
employes=data["Name"]
PScore=[]
for i in range(10):
PScore.append(data[f"Project {i+1}"])
Scorings_combo=[]
for i in range(len(employes)):
for j in range(len(employes)):
for k in range(len(employes)):
for l in range(10):
if i==j or j==k or k==i:
break
score=0
score=score+PScore[l][i]+PScore[l][j]+PScore[l][k]
Scorings_combo.append([i+1,j+1,k+1,l+1,score])
a=[Scorings_combo[i][4]for i in range(len(Scorings_combo))]
#b=sorted(a,reverse=True)
b=sorted(a)
emps=[]
sig=1
empl=[]
passigned=[]
countee=0
for i in range(len(b)):
for j in range(3):
if Scorings_combo[a.index(b[i])][j] in emps or Scorings_combo[a.index(b[i])][3] in passigned:
a[a.index(b[i])]=-1
sig=0
break
if sig!=0:
print("New")
for k in range(3):emps.append(Scorings_combo[a.index(b[i])][k])
empl.append(Scorings_combo[a.index(b[i])])
passigned.append(Scorings_combo[a.index(b[i])][3])
countee=countee+1
if count==8:
break
sig=1
print(f"Iteration:{i}/{len(b)}")
For example: 3,3,3,4,9 will be the solution even if the following is possible: 4,4,4,4,4 because it will look for the descending order distinct elements which gives me the first solution.
Kindly do help me if you have any ideas. Thanks
Here is the drive link to the data: https://drive.google.com/file/d/1yaswBEi3RzrhQ743hJTnUeZFZNo-QBBR/view?usp=sharing
Here's a simpler example: Matrix=[[1,2],[2,1],[1,2],[1,2],[2,1],[1,2]] Now minimum possible combination is: for the first column: [1,1,1] and [1,1,2] for the second column.
The question describes a variation of the "stable marriage problem". Which has a pretty simple algorithm; I pasted in the simplest version below - there are other versions that are more stringent. Suitors who aren't engaged, propose to the other party, which either rejects or accepts (removing their current engagement). This goes on until all parties are engaged.
# Gale-Shapley algorithm:
algorithm stable_matching is
Initialize all m ∈ M and w ∈ W to free
while ∃ free man m who still has a woman w to propose to do
w := first woman on m`s list to whom m has not yet proposed
if w is free then
(m, w) become engaged
else some pair (m`, w) already exists
if w prefers m to m` then
m` becomes free
(m, w) become engaged
else
(m`, w) remain engaged
end if
end if
repeat
Modelling a project distribution solution after the stable marriage problem, we can treat the Teammate
s as the suitors, and the other party as the polygamous Project
, which can be engaged to more than one suitor. This may seem like a lot of code, but it's relatively short and easy to follow.
If the above algorithm is followed exactly, the project distributions may contain the 3rd or 4th picks of some of the teammates - which isn't a desirable situation. So, I found that randomizing the order of assignments and retrying several times while keeping track of the best distribution produced the best results.
Each time this is run, it will likely produce an equivalently scored project distribution, but with teammates arranged differently. So if a teammate decides they don't like the 1 or 2 pick for some reason, just generate a new distribution.
Runnable code at: https://pastebin.com/kVj0FuJP
import io
import pandas as pd
from copy import deepcopy
from itertools import cycle
from random import randint, shuffle
from statistics import mean, stdev
class Teammate:
def __init__(self, identifier, projects, rankings):
self._id = identifier
self._projects = list(projects)
self._rankings = dict(zip(projects, rankings))
self._prefs = self.gen_project_preferences_cycle()
self._project = None
@property
def id(self):
return self._id
def rank(self, project):
return self._rankings[project]
@property
def state(self):
return 'ENGAGED' if self._project else 'FREE'
def propose(self, project):
return project.consider(self)
def engage(self, project):
self._project = project
project.add(self)
def disengage(self, project):
self._project = None
project.remove(self)
@property
def project_preferences(self):
return self._prefs
def gen_project_preferences_cycle(self):
""" Returns a generator for a cyclical list of preferences in
order.
"""
prefs = sorted(self._projects, key=self.rank)
if randint(0, 1):
prefs.insert(0, prefs[1])
return cycle(prefs)
def reset(self):
self._project = None
self._prefs = self.gen_project_preferences_cycle()
By randomly assigning the second pick to the front of some Teammate
's preference lists, we get an interesting effect on the placements. By starting some teammates off on their 2nd pick, it's likely they're going to get bumped out of their 2nd choice and then find another project they rated 1. The net effect is you get matrices with mostly 1's, a small fraction of 2's, and no 3's and 4's.
The more interesting method of the Project
class is the consider()
method where the Project
considers the engagement proposal of a Teammate
. The fickle Project will cast aside any engaged suitor in its set of current suitors, if another comes along with a better offer.
class Project:
def __init__(self, identifier, seats):
self._id = identifier
self._seats = seats
self._team = []
@property
def id(self):
return self._id
@property
def score(self):
return sum(tm.rank(self) for tm in self._team)
def consider(self, teammate):
""" Consider a proposal, have the Teammate establish the engagement
if accepted, then return True; otherwise return False.
"""
rank = teammate.rank(self)
team = self._team
seats = self._seats
if len(team) < seats or any(rank < tm.rank(self) for tm in team):
# Either there's a seat available, or the newcomer has a better
# offer than one of the current suitors.
if len(team) >= seats:
shuffle(team)
tm_out = max(team, key=lambda tm: tm.rank(self))
tm_out.disengage(self) # Hit the road Jack...
teammate.engage(self) # What? and no ring!?...
return True
return False
def add(self, teammate):
self._team.append(teammate)
def remove(self, teammate):
self._team.remove(teammate)
def clear(self):
self._team = []
@property
def state(self):
return 'FREE' if len(self._team) < self._seats else 'ENGAGED'
@property
def roster(self):
self._team.sort(key=lambda tm: tm.id)
return f"{self.id}: {[(tm.id, tm.rank(self)) for tm in self._team]}"
Of course, in the modern world everyone is using matchmaking services, and so do the Teammate
s and Projects
... The service is a bit quirky though and relies to some extent on random tries until it finds the best matchings for all parties.
class ProjectMatcher:
def __init__(self, projects, teammates):
self._projects = projects
self._teammates = teammates
def match(self, num_iterations=1000):
best_score = 99999
best_distrib = None
for _ in range(num_iterations):
# This is the piece that does the actual matching. You can
# see each line corresponds to a step in the Gale-Shapely
# algorithm. You can find the details of some steps within
# the Teammate and Project methods.
for teammate in self.gen_free_teammates_cycle():
for project in teammate.project_preferences:
accepted = teammate.propose(project)
if accepted:
break
# Determine if this distribution is the best so far.
if self.evaluate() < best_score:
best_score = self.evaluate()
best_distrib = deepcopy(self._projects)
# Get ready for another round of matchmaking.
self.reset()
# Print out the best distribution of assignments found.
print("PROJECT ASSIGNMENTS:")
for project in best_distrib:
print(f" {project.roster}")
def evaluate(self):
# Determine the quality of the project assignments. We want a
# low average score, preferably without outliers.
s = [p.score for p in self._projects]
m = mean(s)
d = stdev(s, m)
return m + d
def gen_free_teammates_cycle(self):
""" Returns a cyclical generator of the list of Teammates that are
still unengaged.
"""
teammates = list(self._teammates)
shuffle(teammates)
while any(tm.state == 'FREE' for tm in teammates):
for tm in teammates:
if tm.state == 'FREE':
yield tm
shuffle(teammates)
def reset(self):
[p.clear() for p in self._projects]
[t.reset() for t in self._teammates]
@staticmethod
def from_dataframe(df):
n_seats = len(df.values) // (len(df.columns) - 1)
projects = [Project(proj, n_seats) for proj in df.columns[1:]]
teammates = [Teammate(tm[0], projects, tm[1:]) for tm in df.values]
return ProjectMatcher(projects, teammates)
And to put it all together...
if __name__ == '__main__':
df = pd.read_csv(io.StringIO("""
employee proj_A proj_B proj_C proj_D proj_E
A1 1 5 3 4 2
B1 5 4 1 2 3
C1 2 3 4 1 5
A2 4 2 1 3 5
B2 4 5 3 2 1
C2 3 1 2 5 4
A3 1 2 4 3 5
B3 2 3 1 5 4
C3 5 3 4 1 2
A4 4 5 3 2 1
B4 5 3 4 2 1
C4 1 2 3 4 5
A5 1 3 2 5 4
B5 2 1 3 5 4
C5 2 1 4 5 4
"""), sep=r'\s+')
pm = ProjectMatcher.from_dataframe(df)
pm.match()
Output:
$ python projectmatcher.py
PROJECT ASSIGNMENTS:
proj_A: [('A1', 1), ('A3', 1), ('C4', 1)]
proj_B: [('B5', 1), ('C2', 1), ('C5', 1)]
proj_C: [('A2', 1), ('A5', 2), ('B3', 1)]
proj_D: [('B1', 2), ('C1', 1), ('C3', 1)]
proj_E: [('A4', 1), ('B2', 1), ('B4', 1)]
The results are consistently good with maybe 100 iterations, it defaults to 1K for good measure, which only takes a blink of the eye. Somehow it manages to produce high quality matchings despite being based partly on chance.
The program can be modified to collect all the unique distributions that have the lowest score and generate a list. Each of these had the same lowest score, but the distributions are different:
PROJECT ASSIGNMENTS:
option 1
proj_A: [('A3', 1), ('A5', 1), ('C4', 1)]
proj_B: [('B5', 1), ('C2', 1), ('C5', 1)]
proj_C: [('A2', 1), ('B1', 1), ('B3', 1)]
proj_D: [('B4', 2), ('C1', 1), ('C3', 1)]
proj_E: [('A1', 2), ('A4', 1), ('B2', 1)]
option 2
proj_A: [('A3', 1), ('A5', 1), ('C4', 1)]
proj_B: [('B5', 1), ('C2', 1), ('C5', 1)]
proj_C: [('A2', 1), ('B1', 1), ('B3', 1)]
proj_D: [('A4', 2), ('C1', 1), ('C3', 1)]
proj_E: [('A1', 2), ('B2', 1), ('B4', 1)]
option 3
proj_A: [('A1', 1), ('A3', 1), ('C4', 1)]
proj_B: [('B5', 1), ('C2', 1), ('C5', 1)]
proj_C: [('A2', 1), ('A5', 2), ('B3', 1)]
proj_D: [('B1', 2), ('C1', 1), ('C3', 1)]
proj_E: [('A4', 1), ('B2', 1), ('B4', 1)]
option 4
proj_A: [('A3', 1), ('A5', 1), ('C4', 1)]
proj_B: [('B5', 1), ('C2', 1), ('C5', 1)]
proj_C: [('A2', 1), ('B1', 1), ('B3', 1)]
proj_D: [('B2', 2), ('C1', 1), ('C3', 1)]
proj_E: [('A1', 2), ('A4', 1), ('B4', 1)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With