I have two columns in my dataset, col1 and col2. I want group the data as per col1 and then sort the data as per the size of each group. That is, I want to display groups in ascending order of their size.
I have written the code for grouping and displaying the data as follows:
grouped_data = df.groupby('col1')
"""code for sorting comes here"""
for name,group in grouped_data:
print (name)
print (group)
Before displaying the data, I need to sort it as per group size, which I am not able to do.
To group Pandas dataframe, we use groupby(). To sort grouped dataframe in ascending or descending order, use sort_values(). The size() method is used to get the dataframe size.
Sort Values in Descending Order with Groupby You can sort values in descending order by using ascending=False param to sort_values() method. The head() function is used to get the first n rows. It is useful for quickly testing if your object has the right type of data in it.
Select a cell in the column you want to sort. On the Data tab, in the Sort & Filter group, click Sort. In the Sort dialog box, under Column, in the Sort by or Then by box, select the column that you want to sort by a custom list. Under Order, select Custom List.
Pandas Groupby is used in situations where we want to split data and set into groups so that we can do various operations on those groups like – Aggregation of data, Transformation through some group computations or Filtration according to specific conditions applied on the groups.
For Pandas 0.17+, use sort_values
:
df.groupby('col1').size().sort_values(ascending=False)
For pre-0.17, you can use size().order()
:
df.groupby('col1').size().order(ascending=False)
You can use python's sorted:
In [11]: df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], index=['a', 'b', 'c'], columns=['A', 'B'])
In [12]: g = df.groupby('A')
In [13]: sorted(g, # iterates pairs of (key, corresponding subDataFrame)
key=lambda x: len(x[1]), # sort by number of rows (len of subDataFrame)
reverse=True) # reverse the sort i.e. largest first
Out[13]:
[(1, A B
a 1 2
b 1 4),
(5, A B
c 5 6)]
Note: as an iterator g
, iterates over pairs of the key and the corresponding subframe:
In [14]: list(g) # happens to be the same as the above...
Out[14]:
[(1, A B
a 1 2
b 1 4,
(5, A B
c 5 6)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With