I have a DataFrame, and I'd like find all the permutations of it that fulfill a simple ascending sort on one of the columns. (There are many ties.) For example, in the following DataFrame
df = pd.DataFrame({'name': ["Abe", "Bob", "Chris", "David", "Evan"],
'age': [28, 20, 21, 22, 21]})
I'd be looking to sort by age and obtain the orders ["Bob", "Chris", "Evan", "David", "Abe"]
and ["Bob", "Evan", "Chris", "David", "Abe"]
.
I'm new to python (and to pandas) and curious if there is a simple way to do this that I don't see.
Thanks!
Since you're grouping by age, let's do that and return all the permutations for each group and then take the product (using itertools' product and permutation functions):
In [11]: age = df.groupby("age")
If we look at the permutations of a single group:
In [12]: age.get_group(21)
Out[12]:
age name
2 21 Chris
4 21 Evan
In [13]: list(permutations(age.get_group(21).index))
Out[13]: [(2, 4), (4, 2)]
In [14]: [df.loc[list(p)] for p in permutations(age.get_group(21).index)]
Out[14]:
[ age name
2 21 Chris
4 21 Evan, age name
4 21 Evan
2 21 Chris]
We can do this on the entire DataFrame by returning just the index for each group (this assumes that the index is unique, if it's not reset_index
prior to doing this... you may be able to do something slightly more lower level):
In [21]: [list(permutations(grp.index)) for (name, grp) in age]
Out[21]: [[(1,)], [(2, 4), (4, 2)], [(3,)], [(0,)]]
In [22]: list(product(*[(permutations(grp.index)) for (name, grp) in age]))
Out[22]: [((1,), (2, 4), (3,), (0,)), ((1,), (4, 2), (3,), (0,))]
We can glue these up with sum:
In [23]: [sum(tups, ()) for tups in product(*[(permutations(grp.index)) for (name, grp) in age])]
Out[23]: [(1, 2, 4, 3, 0), (1, 4, 2, 3, 0)]
If you make these a list you can apply loc (which gets you the desired result):
In [24]: [df.loc[list(sum(tups, ()))] for tups in product(*[list(permutations(grp.index)) for (name, grp) in age])]
Out[24]:
[ age name
1 20 Bob
2 21 Chris
4 21 Evan
3 22 David
0 28 Abe, age name
1 20 Bob
4 21 Evan
2 21 Chris
3 22 David
0 28 Abe]
And the (list of) the name column:
In [25]: [list(df.loc[list(sum(tups, ())), "name"]) for tups in product(*[(permutations(grp.index)) for (name, grp) in age])]
Out[25]:
[['Bob', 'Chris', 'Evan', 'David', 'Abe'],
['Bob', 'Evan', 'Chris', 'David', 'Abe']]
Note: It may be faster to use a numpy permutation matrix and pd.tools.util.cartesian_product
. I suspect it's much of a muchness and wouldn't explore this unless this was unusably slow (it's potentially going to be slow anyway because there could be many many permutations)...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With