I am brand new to Python, seems like it has a lot of flexibility and is faster than traditional RDBMS systems.
Working on a very simple process to create random fantasy teams. I come from an RDBMS background (Oracle SQL) and that does not seem to be optimal for this data processing.
I made a dataframe using pandas read from csv file and now have a simple dataframe with two columns -- Player, Salary:
` Name Salary
0 Jason Day 11700
1 Dustin Johnson 11600
2 Rory McIlroy 11400
3 Jordan Spieth 11100
4 Henrik Stenson 10500
5 Phil Mickelson 10200
6 Justin Rose 9800
7 Adam Scott 9600
8 Sergio Garcia 9400
9 Rickie Fowler 9200`
What I am trying to do via python (pandas) is produce all combinations of 6 players which salary is between a certain amount 45000 -- 50000.
In looking up python options, I found the itertools combination interesting, but it would result a massive list of combinations without filtering the sum of salary.
In traditional SQL, I would do a massive merge cartesian join w/ SUM, but then I get the players in different spots..
Such as A, B, C then, C, B, A..
My traditional SQL which doesn't work well enough is something like this:
` SELECT distinct
ONE.name AS "1",
TWO.name AS "2",
THREE.name AS "3",
FOUR.name AS "4",
FIVE.name AS "5",
SIX.name AS "6",
sum(one.salary + two.salary + three.salary + four.salary + five.salary + six.salary) as salary
FROM
nl.pgachamp2 ONE,nl.pgachamp2 TWO,nl.pgachamp2 THREE, nl.pgachamp2 FOUR,nl.pgachamp2 FIVE,nl.pgachamp2 SIX
where ONE.name != TWO.name
and ONE.name != THREE.name
and one.name != four.name
and one.name != five.name
and TWO.name != THREE.name
and TWO.name != four.name
and two.name != five.name
and TWO.name != six.name
and THREE.name != four.name
and THREE.name != five.name
and three.name != six.name
and five.name != six.name
and four.name != six.name
and four.name != five.name
and one.name != six.name
group by ONE.name, TWO.name, THREE.name, FOUR.name, FIVE.name, SIX.name`
Is there a way to do this in Pandas/Python?
Any documentation that can be pointed to would be great!
I ran this for combinations of 6 and found no teams that satisfied. I used 5 instead.
This should get you there:
from itertools import combinations
import pandas as pd
s = df.set_index('Name').squeeze()
combos = pd.DataFrame([c for c in combinations(s.index, 5)])
combo_salary = combos.apply(lambda x: s.ix[x].sum(), axis=1)
combos[(combo_salary >= 45000) & (combo_salary <= 50000)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With