Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Get unique MultiIndex level values by label

Tags:

python

pandas

Say you have this MultiIndex-ed DataFrame:

df = pd.DataFrame({'co':['DE','DE','FR','FR'],                    'tp':['Lake','Forest','Lake','Forest'],                    'area':[10,20,30,40],                    'count':[7,5,2,3]}) df = df.set_index(['co','tp']) 

Which looks like this:

           area  count co tp DE Lake      10      7    Forest    20      5 FR Lake      30      2    Forest    40      3 

I would like to retrieve the unique values per index level. This can be accomplished using

df.index.levels[0]  # returns ['DE', 'FR] df.index.levels[1]  # returns ['Lake', 'Forest'] 

What I would really like to do, is to retrieve these lists by addressing the levels by their name, i.e. 'co' and 'tp'. The shortest two ways I could find looks like this:

list(set(df.index.get_level_values('co')))  # returns ['DE', 'FR'] df.index.levels[df.index.names.index('co')]  # returns ['DE', 'FR'] 

But non of them are very elegant. Is there a shorter way?

like image 200
ojdo Avatar asked Jun 30 '14 17:06

ojdo


People also ask

How do you check if values in column are unique in pandas?

You can get unique values in column (multiple columns) from pandas DataFrame using unique() or Series. unique() functions. unique() from Series is used to get unique values from a single column and the other one is used to get from multiple columns.


2 Answers

Pandas 0.23.0 finally introduced a much cleaner solution to this problem: the level argument to Index.unique():

In [3]: df.index.unique(level='co') Out[3]: Index(['DE', 'FR'], dtype='object', name='co') 

This is now the recommended solution. It is far more efficient because it avoids creating a complete representation of the level values in memory, and re-scanning it.

like image 199
Pietro Battiston Avatar answered Oct 05 '22 23:10

Pietro Battiston


I guess u want unique values in a certain level (and by level names) of a multiindex. I usually do the following, which is a bit long.

In [11]: df.index.get_level_values('co').unique() Out[11]: array(['DE', 'FR'], dtype=object) 
like image 29
Happy001 Avatar answered Oct 06 '22 00:10

Happy001