Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: iterate over unique values of a column that is already in sorted order

I have constructed a pandas data frame in sorted order and would like to iterate over groups having identical values of a particular column. It seems to me that the groupby functionality is useful for this, but as far as I can tell performing groupby does not give any guarantee about the order of the key. How can I extract the unqiue column values in sorted order.

Here is an example data frame:

Foo,1
Foo,2
Bar,2
Bar,1

I would like a list ["Foo","Bar"] where the order is guaranteed by the order of the original data frame. I can then use this list to extract appropriate rows. The sort is actually defined in my case by columns that are also given in the data frame (not included in the example above) and so a solution that re-sorts will be acceptable if the information can not be pulled out directly.

like image 708
Setjmp Avatar asked Dec 18 '13 17:12

Setjmp


1 Answers

As mentioned in the comments, you can use unique on the column which will preserve the order (unlike numpy's unique, it doesn't sort):

In [11]: df
Out[11]: 
     0  1
0  Foo  1
1  Foo  2
2  Bar  2
3  Bar  1

In [12]: df[0].unique()
Out[12]: array(['Foo', 'Bar'], dtype=object)

Then you can access the relevant rows using groupby's get_group:

In [13]: g = df.groupby([0])

In [14]: g.get_group('Foo')
Out[14]: 
     0  1
0  Foo  1
1  Foo  2    
like image 62
Andy Hayden Avatar answered Sep 19 '22 22:09

Andy Hayden