Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas self join for merge cartesian product to produce all combinations and sum

I am brand new to Python, seems like it has a lot of flexibility and is faster than traditional RDBMS systems.

Working on a very simple process to create random fantasy teams. I come from an RDBMS background (Oracle SQL) and that does not seem to be optimal for this data processing.

I made a dataframe using pandas read from csv file and now have a simple dataframe with two columns -- Player, Salary:

`                    Name  Salary
0              Jason Day   11700
1         Dustin Johnson   11600
2           Rory McIlroy   11400
3          Jordan Spieth   11100
4         Henrik Stenson   10500
5         Phil Mickelson   10200
6            Justin Rose    9800
7             Adam Scott    9600
8          Sergio Garcia    9400
9          Rickie Fowler    9200`

What I am trying to do via python (pandas) is produce all combinations of 6 players which salary is between a certain amount 45000 -- 50000.

In looking up python options, I found the itertools combination interesting, but it would result a massive list of combinations without filtering the sum of salary.

In traditional SQL, I would do a massive merge cartesian join w/ SUM, but then I get the players in different spots..

Such as A, B, C then, C, B, A..

My traditional SQL which doesn't work well enough is something like this:

` SELECT distinct
ONE.name AS "1", 
  TWO.name AS "2",
    THREE.name AS "3",
      FOUR.name AS "4", 
  FIVE.name AS "5", 
  SIX.name AS "6",
   sum(one.salary + two.salary + three.salary + four.salary + five.salary + six.salary) as salary
  FROM 
  nl.pgachamp2 ONE,nl.pgachamp2 TWO,nl.pgachamp2 THREE, nl.pgachamp2 FOUR,nl.pgachamp2 FIVE,nl.pgachamp2 SIX
 where ONE.name != TWO.name
 and ONE.name != THREE.name
 and one.name != four.name
 and one.name != five.name
 and TWO.name != THREE.name
 and TWO.name != four.name
 and two.name != five.name
 and TWO.name != six.name
 and THREE.name != four.name
 and THREE.name != five.name
 and three.name != six.name
 and five.name != six.name
 and four.name != six.name
 and four.name != five.name
 and one.name != six.name
 group by ONE.name, TWO.name, THREE.name, FOUR.name, FIVE.name, SIX.name`

Is there a way to do this in Pandas/Python?

Any documentation that can be pointed to would be great!

like image 853
FireLeast Avatar asked Mar 12 '23 14:03

FireLeast


1 Answers

I ran this for combinations of 6 and found no teams that satisfied. I used 5 instead.

This should get you there:

from itertools import combinations
import pandas as pd


s = df.set_index('Name').squeeze()
combos = pd.DataFrame([c for c in combinations(s.index, 5)])
combo_salary = combos.apply(lambda x: s.ix[x].sum(), axis=1)
combos[(combo_salary >= 45000) & (combo_salary <= 50000)]

enter image description here

like image 118
piRSquared Avatar answered Apr 28 '23 07:04

piRSquared