Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get unique values from index column in MultiIndex

Tags:

python

pandas

I know that I can get the unique values of a DataFrame by resetting the index but is there a way to avoid this step and get the unique values directly?

Given I have:

        C  A B       0 one  3  1 one  2  2 two  1 

I can do:

df = df.reset_index() uniq_b = df.B.unique() df = df.set_index(['A','B']) 

Is there a way built in pandas to do this?

like image 799
seth Avatar asked Dec 15 '12 01:12

seth


People also ask

Is index unique in pandas?

Pandas Index is an immutable ndarray implementing an ordered, sliceable set. It is the basic object which stores the axis labels for all pandas objects. Pandas Index. is_unique attribute return True if the underlying data in the given Index object is unique else it return False .


2 Answers

One way is to use index.levels:

In [11]: df Out[11]:         C A B      0 one  3 1 one  2 2 two  1  In [12]: df.index.levels[1] Out[12]: Index([one, two], dtype=object) 
like image 72
Andy Hayden Avatar answered Oct 10 '22 15:10

Andy Hayden


Andy Hayden's answer (index.levels[blah]) is great for some scenarios, but can lead to odd behavior in others. My understanding is that Pandas goes to great lengths to "reuse" indices when possible to avoid having the indices of lots of similarly-indexed DataFrames taking up space in memory. As a result, I've found the following annoying behavior:

import pandas as pd import numpy as np  np.random.seed(0)  idx = pd.MultiIndex.from_product([['John', 'Josh', 'Alex'], list('abcde')],                                   names=['Person', 'Letter']) large = pd.DataFrame(data=np.random.randn(15, 2),                       index=idx,                       columns=['one', 'two']) small = large.loc[['Jo'==d[0:2] for d in large.index.get_level_values('Person')]]  print small.index.levels[0] print large.index.levels[0] 

Which outputs

Index([u'Alex', u'John', u'Josh'], dtype='object') Index([u'Alex', u'John', u'Josh'], dtype='object') 

rather than the expected

Index([u'John', u'Josh'], dtype='object') Index([u'Alex', u'John', u'Josh'], dtype='object') 

As one person pointed out on the other thread, one idiom that seems very natural and works properly would be:

small.index.get_level_values('Person').unique() large.index.get_level_values('Person').unique() 

I hope this helps someone else dodge the super-unexpected behavior that I ran into.

like image 29
8one6 Avatar answered Oct 10 '22 15:10

8one6