I'm trying to resample data from a dataframe. Columns have different types of data. For one of the columns I'd like to count the rows for which the column has a value larger than 0.
A small example would look like this:
import pandas as pd
import numpy as np
df = pd.DataFrame(data={'Date': pd.date_range('2018-01-01','2018-01-15'),
'A': np.random.randint(5, size=15)})
df.set_index(df.Date, inplace=True)
df.resample('5D').count()
Counting works, but I can't find a way to insert the condition that I only want to count values larger than 0. Something like this:
df.resample('5D').count(df[df.A > 0])
However, TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed
Question: How to resample().count()
with conditions
You can use Resampler.apply
and sum
of True
s values which are processes like 1
s:
np.random.seed(57)
import pandas as pd
import numpy as np
df = pd.DataFrame(data={'Date': pd.date_range('2018-01-01','2018-01-15'),
'A': np.random.randint(5, size=15)})
df.set_index(df.Date, inplace=True)
df1 = df.resample('5D')['A'].apply(lambda x: (x > 0).sum())
print (df1)
Date
2018-01-01 2
2018-01-06 3
2018-01-11 4
Name: A, dtype: int64
Or better solution is create boolean mask and with resample
aggregate sum
:
df1 = (df['A'] > 0).resample('5D').sum().astype(int)
print (df1)
Date
2018-01-01 2
2018-01-06 3
2018-01-11 4
Name: A, dtype: int32
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With