In a multi index
pandas dataframe I want to access the last element of the second index for all values of the first index. The number of levels in the second index vary depending on the value of the first index. I went through the pandas multi index documentation but could not find anything that does that.
For example, for the data frame below:
arrays = [ ['bar', 'bar', 'baz', 'foo', 'foo', 'foo', 'qux'],
['one', 'two', 'one', 'one', 'two', 'three', 'one']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(7, 3), index=index, columns=['A', 'B', 'C'])
df
A B C
first second
bar one 0.289163 -0.464633 -0.060487
two 0.224442 0.177609 2.156436
baz one -0.262329 -0.248384 0.925580
foo one 0.051350 0.452014 0.206809
two 2.757255 -0.739196 0.183735
three -0.064909 -0.963130 1.364771
qux one -1.330857 1.881588 -0.262170
I want to get:
A B C
first second
bar two 0.224442 0.177609 2.156436
baz one -0.262329 -0.248384 0.925580
foo three -0.064909 -0.963130 1.364771
qux one -1.330857 1.881588 -0.262170
The dataframes
I am working with have over 10M
lines so I want to avoid explicit looping.
Use groupby
with tail
:
print (df.groupby(level='first').tail(1))
A B C
first second
bar two 0.053054 -0.555819 0.589998
baz one -0.868676 1.293633 1.339474
foo three 0.407454 0.738872 1.811894
qux one -0.346014 -1.491270 0.446772
because last
lost level second
:
print (df.groupby(level='first').last())
A B C
first
bar 0.053054 -0.555819 0.589998
baz -0.868676 1.293633 1.339474
foo 0.407454 0.738872 1.811894
qux -0.346014 -1.491270 0.446772
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With