I have a MultiIndexed DataFrame like this:
In [2]: ix = pd.MultiIndex.from_product([[1, 2, 3], ['foo', 'bar'], ['baz', 'can']], names=['a', 'b', 'c'])
In [3]: data = np.arange(len(ix))
In [4]: df = pd.DataFrame(data, index=ix, columns=['hi'])
In [43]: df = df[~df.hi.isin([2, 3])]
In [44]: df
Out[44]:
hi
a b c
1 foo baz 0
can 1
2 foo baz 4
can 5
bar baz 6
can 7
3 foo baz 8
can 9
bar baz 10
can 11
I'd like to know which pairs of the levels of a
and b
occur in the DataFrame:
[(1, 'foo'), (2, 'foo'), (2, 'bar'), (3, 'foo'), (3, 'bar')]
I can do this using pd.unique
and df.index.get_level_values
but it seems kind of rubbish:
In [66]: pd.unique(zip(df.index.get_level_values(0), df.index.get_level_values(1)))
Out[66]: array([(1, 'foo'), (2, 'foo'), (2, 'bar'), (3, 'foo'), (3, 'bar')], dtype=object)
Is there a "nice" way?
In [22]: df.reset_index().set_index(['a','b']).index.unique()
Out[22]: array([(1, 'foo'), (2, 'foo'), (2, 'bar'), (3, 'foo'), (3, 'bar')], dtype=object)
You can call drop_level
on your multi-index and then unique
to obtain the list you desire:
In [126]:
df.index.droplevel('c').unique()
Out[126]:
array([(1, 'foo'), (2, 'foo'), (2, 'bar'), (3, 'foo'), (3, 'bar')], dtype=object)
It's difficult to access index columns the same way as data columns, so the problem becomes much easier if you reset the index before trying:
>>> dff = df.reset_index()
dff
now looks like this:
a b c hi
0 1 foo baz 0
1 1 foo can 1
2 2 foo baz 4
3 2 foo can 5
4 2 bar baz 6
5 2 bar can 7
6 3 foo baz 8
7 3 foo can 9
8 3 bar baz 10
9 3 bar can 11
Now it's relatively simple to get the values you want. My first fumbling attempt was:
>>> pd.unique(zip(dff.a, dff.b))
array([(1, 'foo'), (2, 'foo'), (2, 'bar'), (3, 'foo'), (3, 'bar')], dtype=object)
This is more readable, but as @LondonRob pointed out, having reset the index there is no need to to zip the columns together; you get the same result from the original table without binding the re-indexed DataFrame
to a variable simply by using a list of column names as the index:
>>> pd.unique(df.reset_index()[['a', 'b']].values)
array([(1, 'foo'), (2, 'foo'), (2, 'bar'), (3, 'foo'), (3, 'bar')], dtype=object)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With