I have constructed a pandas data frame in sorted order and would like to iterate over groups having identical values of a particular column. It seems to me that the groupby functionality is useful for this, but as far as I can tell performing groupby does not give any guarantee about the order of the key. How can I extract the unqiue column values in sorted order.
Here is an example data frame:
Foo,1
Foo,2
Bar,2
Bar,1
I would like a list ["Foo","Bar"] where the order is guaranteed by the order of the original data frame. I can then use this list to extract appropriate rows. The sort is actually defined in my case by columns that are also given in the data frame (not included in the example above) and so a solution that re-sorts will be acceptable if the information can not be pulled out directly.
As mentioned in the comments, you can use unique on the column which will preserve the order (unlike numpy's unique, it doesn't sort):
In [11]: df
Out[11]:
0 1
0 Foo 1
1 Foo 2
2 Bar 2
3 Bar 1
In [12]: df[0].unique()
Out[12]: array(['Foo', 'Bar'], dtype=object)
Then you can access the relevant rows using groupby's get_group
:
In [13]: g = df.groupby([0])
In [14]: g.get_group('Foo')
Out[14]:
0 1
0 Foo 1
1 Foo 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With