Say you have this MultiIndex-ed DataFrame:
df = pd.DataFrame({'co':['DE','DE','FR','FR'], 'tp':['Lake','Forest','Lake','Forest'], 'area':[10,20,30,40], 'count':[7,5,2,3]}) df = df.set_index(['co','tp'])
Which looks like this:
area count co tp DE Lake 10 7 Forest 20 5 FR Lake 30 2 Forest 40 3
I would like to retrieve the unique values per index level. This can be accomplished using
df.index.levels[0] # returns ['DE', 'FR] df.index.levels[1] # returns ['Lake', 'Forest']
What I would really like to do, is to retrieve these lists by addressing the levels by their name, i.e. 'co'
and 'tp'
. The shortest two ways I could find looks like this:
list(set(df.index.get_level_values('co'))) # returns ['DE', 'FR'] df.index.levels[df.index.names.index('co')] # returns ['DE', 'FR']
But non of them are very elegant. Is there a shorter way?
You can get unique values in column (multiple columns) from pandas DataFrame using unique() or Series. unique() functions. unique() from Series is used to get unique values from a single column and the other one is used to get from multiple columns.
Pandas 0.23.0 finally introduced a much cleaner solution to this problem: the level
argument to Index.unique()
:
In [3]: df.index.unique(level='co') Out[3]: Index(['DE', 'FR'], dtype='object', name='co')
This is now the recommended solution. It is far more efficient because it avoids creating a complete representation of the level values in memory, and re-scanning it.
I guess u want unique values in a certain level (and by level names) of a multiindex. I usually do the following, which is a bit long.
In [11]: df.index.get_level_values('co').unique() Out[11]: array(['DE', 'FR'], dtype=object)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With