Panda Dataframe Resampling based on column criteria

Question

I want to resample a dataframe if cell in another column matches my criteria

df = pd.DataFrame({
        'timestamp': [
            '2013-03-01 08:01:00', '2013-03-01 08:02:00',
            '2013-03-01 08:03:00', '2013-03-01 08:04:00',
            '2013-03-01 08:05:00', '2013-03-01 08:06:00'
        ],
        'Kind': [
            'A', 'B', 'A', 'B', 'A', 'B'
        ],
        'Values': [1, 1.5, 2, 3, 5, 3]
    })

For every timestamp, I may have 2-10 kinds, and I want to resample correctly without producing NaN. Currently I resample on the entire dataframe using below code and get NaNs. I think it's due to I have multiple entries for certain timestamps.

df.set_index('timestamp').resample('5Min').mean()

One method is to create different dataframes for every kind, resample every dataframe, and join the resulting dataframes. I'd like to find out if there's any simple way of doing it.

Cedric Zoppolo · Accepted Answer

After defining your dataframe as you stated, you should transform timestamp column to datetime first. Then set it as the index and finally resampling and finding the mean as follows:

import pandas as pd
df = pd.DataFrame({
        'timestamp': [
            '2013-03-01 08:01:00', '2013-03-01 08:02:00',
            '2013-03-01 08:03:00', '2013-03-01 08:04:00',
            '2013-03-01 08:05:00', '2013-03-01 08:06:00'
        ],
        'Kind': [
            'A', 'B', 'A', 'B', 'A', 'B'
        ],
        'Values': [1, 1.5, 2, 3, 5, 3]
    })

df.timestamp = pd.to_datetime(df.timestamp)
df = df.set_index(["timestamp"])
df = df.resample("5Min")    
print df.mean()

This would print the mean you expect:

>>> 
Values    2.75

And your dataframe would result in:

>>> df
                     Values
timestamp                  
2013-03-01 08:05:00     2.5
2013-03-01 08:10:00     3.0

Grouping by kind

If you want to group by kind and get the mean of each Kind (means A and B) you can do as follows:

df.timestamp = pd.to_datetime(df.timestamp)
df = df.set_index(["timestamp"])
gb = df.groupby(["Kind"])
df = gb.resample("5Min")
print df.xs("A", level = "Kind").mean()
print df.xs("B", level = "Kind").mean()

As result you would get:

>>> 
Values    2.666667
Values    2.625

And your dataframe would finally look as:

>>> df
                            Values
Kind timestamp                    
A    2013-03-01 08:05:00  2.666667
B    2013-03-01 08:05:00  2.250000
     2013-03-01 08:10:00  3.000000

Panda Dataframe Resampling based on column criteria

Tags:

python

pandas

dataframe

resampling

yusica

1 Answers

Cedric Zoppolo

Recent Activity

Donate For Us

Panda Dataframe Resampling based on column criteria

Tags:

python

pandas

dataframe

resampling

yusica

1 Answers

Cedric Zoppolo

Related questions

Recent Activity

Donate For Us