Sample data:
mdf = pd.DataFrame([[1,2,50],[1,2,20],
[1,5,10],[2,8,80],
[2,5,65],[2,8,10]
], columns=['src','dst','n']); mdf
src dst n
0 1 2 50
1 1 2 20
2 1 5 10
3 2 8 80
4 2 5 65
5 2 8 10
groupby()
gives a two-level multi-index:
test = mdf.groupby(['src','dst'])['n'].agg(['sum','count']); test
sum count
src dst
1 2 70 2
5 10 1
2 5 65 1
8 90 2
Question: how to sort this DataFrame by src
ascending and then by sum
descending?
I'm a beginner with pandas, learned about sort_index() and sort_values(), but in this task it seems that I need both simultaneously.
Expected result, under each "src" sorting is determined by the "sum":
sum count
src dst
1 2 70 2
5 10 1
2 8 90 2
5 65 1
You can sort pandas DataFrame by one or multiple (one or more) columns using sort_values() method and by ascending or descending order. To specify the order, you have to use ascending boolean property; False for descending and True for ascending.
However, to sort MultiIndex at a specific level, use the multiIndex. sortlevel() method in Pandas. Set the level as an argument. To sort in descending order, use the ascending parameter and set to False.
To sort the DataFrame based on the values in a single column, you'll use . sort_values() . By default, this will return a new DataFrame sorted in ascending order. It does not modify the original DataFrame.
All of the sorting methods available in Pandas fall under the following three categories: Sorting by index labels; Sorting by column values; Sorting by a combination of index labels and column values.
In case anyone else comes across this using google as well. Since pandas version 0.23, you can pass the name of the level as an argument to sort_values
:
test.sort_values(['src','sum'], ascending=[1,0])
Result:
sum count
src dst
1 2 70 2
5 10 1
2 8 90 2
5 65 1
IIUC:
In [29]: test.sort_values('sum', ascending=False).sort_index(level=0)
Out[29]:
sum count
src dst
1 2 80 2
5 10 1
2 8 80 1
UPDATE: very similar to @anonyXmous's solution:
In [47]: (test.reset_index()
.sort_values(['src','sum'], ascending=[1,0])
.set_index(['src','dst']))
Out[47]:
sum count
src dst
1 2 70 2
5 10 1
2 8 90 2
5 65 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With