Is there a way to get the result of get_level_values
for more than one column?
Given the following DataFrame
:
d
a b c
1 4 10 16
11 17
5 12 18
2 5 13 19
6 14 20
3 7 15 21
I wish to get the values (i.e. list of tuples) of levels a
and c
:
[(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]
Notes:
It is impossible to give get_level_values
more than one level (e.g. df.index.get_level_values(['a','c']
)
There's a workaround in which one could use get_level_values
over each desired column and zip
them together:
For example:
a_list = df.index.get_level_values('a').values
c_list = df.index.get_level_values('c').values
print([i for i in zip(a_list,c_list)])
[(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]
but it get cumbersome as the number of columns grow.
DataFrame
:df = pd.DataFrame({'a':[1,1,1,2,2,3],'b':[4,4,5,5,6,7,],'c':[10,11,12,13,14,15], 'd':[16,17,18,19,20,21]}).set_index(['a','b','c'])
The .tolist()
method of a MultiIndex
gives a list of tuples for all the levels in the MultiIndex
. For example, with your example DataFrame
,
df.index.tolist()
# => [(1, 4, 10), (1, 4, 11), (1, 5, 12), (2, 5, 13), (2, 6, 14), (3, 7, 15)]
So here are two ideas:
Get the list of tuples from the original MultiIndex
and filter the result.
[(a, c) for a, b, c in df.index.tolist()]
# => [(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]
The disadvantage of this simple method is that you have you manually specify the order of the levels you want. You can leverage itertools.compress
to select them by name instead.
from itertools import compress
mask = [1 if name in ['a', 'c'] else 0 for name in df.index.names]
[tuple(compress(t, mask)) for t in df.index.tolist()]
# => [(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]
Create a MultiIndex that has exactly the levels you want and call .tolist()
on it.
df.index.droplevel('b').tolist()
# => [(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]
If you would prefer to name the levels you want to keep — instead of those that you want to drop — you could do something like
df.index.droplevel([level for level in df.index.names
if not level in ['a', 'c']]).tolist()
# => [(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]
This is less cumbersome insofar as you can pass the list of index names you want to select:
df.reset_index()[['a', 'c']].to_dict(orient='split')['data']
I have not found a way of selecting levels 'a'
and 'b'
from the index object directly, hence the use of reset_index
.
Note that to_dict
returns a list of lists and not tuples:
[[1, 10], [1, 11], [1, 12], [2, 13], [2, 14], [3, 15]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With