Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combinations of MultiIndex levels which occur in a DataFrame

Tags:

python

pandas

I have a MultiIndexed DataFrame like this:

In [2]: ix = pd.MultiIndex.from_product([[1, 2, 3], ['foo', 'bar'], ['baz', 'can']], names=['a', 'b', 'c'])
In [3]: data = np.arange(len(ix))
In [4]: df = pd.DataFrame(data, index=ix, columns=['hi'])
In [43]: df = df[~df.hi.isin([2, 3])]
In [44]: df
Out[44]: 
           hi
a b   c      
1 foo baz   0
      can   1
2 foo baz   4
      can   5
  bar baz   6
      can   7
3 foo baz   8
      can   9
  bar baz  10
      can  11

I'd like to know which pairs of the levels of a and b occur in the DataFrame:

[(1, 'foo'), (2, 'foo'), (2, 'bar'), (3, 'foo'), (3, 'bar')]

I can do this using pd.unique and df.index.get_level_values but it seems kind of rubbish:

In [66]: pd.unique(zip(df.index.get_level_values(0), df.index.get_level_values(1)))
Out[66]: array([(1, 'foo'), (2, 'foo'), (2, 'bar'), (3, 'foo'), (3, 'bar')], dtype=object)

Is there a "nice" way?

like image 796
LondonRob Avatar asked Aug 13 '15 14:08

LondonRob


3 Answers

In [22]: df.reset_index().set_index(['a','b']).index.unique()
Out[22]: array([(1, 'foo'), (2, 'foo'), (2, 'bar'), (3, 'foo'), (3, 'bar')], dtype=object)
like image 166
Jeff Avatar answered Nov 11 '22 12:11

Jeff


You can call drop_level on your multi-index and then unique to obtain the list you desire:

In [126]:    
df.index.droplevel('c').unique()

Out[126]:
array([(1, 'foo'), (2, 'foo'), (2, 'bar'), (3, 'foo'), (3, 'bar')], dtype=object)
like image 20
EdChum Avatar answered Nov 11 '22 13:11

EdChum


It's difficult to access index columns the same way as data columns, so the problem becomes much easier if you reset the index before trying:

>>> dff = df.reset_index()

dff now looks like this:

   a    b    c  hi
0  1  foo  baz   0
1  1  foo  can   1
2  2  foo  baz   4
3  2  foo  can   5
4  2  bar  baz   6
5  2  bar  can   7
6  3  foo  baz   8
7  3  foo  can   9
8  3  bar  baz  10
9  3  bar  can  11

Now it's relatively simple to get the values you want. My first fumbling attempt was:

>>> pd.unique(zip(dff.a, dff.b))
array([(1, 'foo'), (2, 'foo'), (2, 'bar'), (3, 'foo'), (3, 'bar')], dtype=object)

This is more readable, but as @LondonRob pointed out, having reset the index there is no need to to zip the columns together; you get the same result from the original table without binding the re-indexed DataFrame to a variable simply by using a list of column names as the index:

>>> pd.unique(df.reset_index()[['a', 'b']].values)
array([(1, 'foo'), (2, 'foo'), (2, 'bar'), (3, 'foo'), (3, 'bar')], dtype=object)
like image 38
holdenweb Avatar answered Nov 11 '22 14:11

holdenweb