Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to access last element of a multi-index dataframe

Tags:

python

pandas

I have a dataframe with IDs and timestamps as a multi-index. The index in the dataframe is sorted by IDs and timestamps and I want to pick the lastest timestamp for each IDs. for example:

IDs    timestamp     value
0      2010-10-30     1
       2010-11-30     2
1      2000-01-01     300
       2007-01-01     33
       2010-01-01     400
2      2000-01-01     11

So basically the result I want is

IDs    timestamp    value
0      2010-11-30   2
1      2010-01-01   400
2      2000-01-01   11

What is the command to do that in pandas?

like image 619
silencer Avatar asked Oct 02 '13 19:10

silencer


2 Answers

Given this setup:

import pandas as pd
import numpy as np
import io

content = io.BytesIO("""\
IDs    timestamp     value
0      2010-10-30     1
0      2010-11-30     2
1      2000-01-01     300
1      2007-01-01     33
1      2010-01-01     400
2      2000-01-01     11""")

df = pd.read_table(content, header=0, sep='\s+', parse_dates=[1])
df.set_index(['IDs', 'timestamp'], inplace=True)

using reset_index followed by groupby

df.reset_index(['timestamp'], inplace=True)
print(df.groupby(level=0).last())

yields

              timestamp  value
IDs                           
0   2010-11-30 00:00:00      2
1   2010-01-01 00:00:00    400
2   2000-01-01 00:00:00     11

This does not feel like the best solution, however. There should be a way to do this without calling reset_index...


As you point out in the comments, last ignores NaN values. To not skip NaN values, you could use groupby/agg like this:

df.reset_index(['timestamp'], inplace=True)
grouped = df.groupby(level=0)
print(grouped.agg(lambda x: x.iloc[-1]))
like image 141
unutbu Avatar answered Oct 11 '22 18:10

unutbu


One can also use

df.groupby("IDs").tail(1)

This will take the last row of each label in level "IDs" and will not ignore NaN values.

like image 4
Tai Avatar answered Oct 11 '22 17:10

Tai