Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Resample dataframe with count method for certain conditions

I'm trying to resample data from a dataframe. Columns have different types of data. For one of the columns I'd like to count the rows for which the column has a value larger than 0.

A small example would look like this:

import pandas as pd
import numpy as np

df = pd.DataFrame(data={'Date': pd.date_range('2018-01-01','2018-01-15'),
                        'A': np.random.randint(5, size=15)})
df.set_index(df.Date, inplace=True)

df.resample('5D').count()

Counting works, but I can't find a way to insert the condition that I only want to count values larger than 0. Something like this:

df.resample('5D').count(df[df.A > 0])

However, TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed

Question: How to resample().count() with conditions

like image 702
Jeroen Avatar asked Jul 10 '18 12:07

Jeroen


1 Answers

You can use Resampler.apply and sum of Trues values which are processes like 1s:

np.random.seed(57)

import pandas as pd
import numpy as np

df = pd.DataFrame(data={'Date': pd.date_range('2018-01-01','2018-01-15'),
                        'A': np.random.randint(5, size=15)})
df.set_index(df.Date, inplace=True)

df1 = df.resample('5D')['A'].apply(lambda x: (x > 0).sum())
print (df1)
Date
2018-01-01    2
2018-01-06    3
2018-01-11    4
Name: A, dtype: int64

Or better solution is create boolean mask and with resample aggregate sum:

df1 = (df['A'] > 0).resample('5D').sum().astype(int)
print (df1)

Date
2018-01-01    2
2018-01-06    3
2018-01-11    4
Name: A, dtype: int32
like image 59
jezrael Avatar answered Nov 17 '22 17:11

jezrael