Either I don't understand the documentation or it is outdated.
If I run
user[["DOC_ACC_DT", "USER_SIGNON_ID"]].groupby("DOC_ACC_DT").agg(["count"]).resample("1D").fillna(value=0, method="ffill")
It get
TypeError: fillna() got an unexpected keyword argument 'value'
If I just run
.fillna(0)
I get
ValueError: Invalid fill method. Expecting pad (ffill), backfill (bfill) or nearest. Got 0
If I then set
.fillna(0, method="ffill")
I get
TypeError: fillna() got multiple values for keyword argument 'method'
so the only thing that works is
.fillna("ffill")
but of course that makes just a forward fill. However, I want to replace NaN
with zeros. What am I doing wrong here?
Well, I don't get why the code above is not working and I'm going to wait for somebody to give a better answer than this but I just found
.replace(np.nan, 0)
does what I would have expected from .fillna(0)
.
I do some test and it is very interesting.
Sample:
import pandas as pd
import numpy as np
np.random.seed(1)
rng = pd.date_range('1/1/2012', periods=20, freq='S')
df = pd.DataFrame({'a':['a'] * 10 + ['b'] * 10,
'b':np.random.randint(0, 500, len(rng))}, index=rng)
df.b.iloc[3:8] = np.nan
print (df)
a b
2012-01-01 00:00:00 a 37.0
2012-01-01 00:00:01 a 235.0
2012-01-01 00:00:02 a 396.0
2012-01-01 00:00:03 a NaN
2012-01-01 00:00:04 a NaN
2012-01-01 00:00:05 a NaN
2012-01-01 00:00:06 a NaN
2012-01-01 00:00:07 a NaN
2012-01-01 00:00:08 a 335.0
2012-01-01 00:00:09 a 448.0
2012-01-01 00:00:10 b 144.0
2012-01-01 00:00:11 b 129.0
2012-01-01 00:00:12 b 460.0
2012-01-01 00:00:13 b 71.0
2012-01-01 00:00:14 b 237.0
2012-01-01 00:00:15 b 390.0
2012-01-01 00:00:16 b 281.0
2012-01-01 00:00:17 b 178.0
2012-01-01 00:00:18 b 276.0
2012-01-01 00:00:19 b 254.0
Downsampling:
Possible solution with Resampler.asfreq
:
If use asfreq
, behaviour is same aggregating by first
:
print (df.groupby('a').resample('2S').first())
a b
a
a 2012-01-01 00:00:00 a 37.0
2012-01-01 00:00:02 a 396.0
2012-01-01 00:00:04 a NaN
2012-01-01 00:00:06 a NaN
2012-01-01 00:00:08 a 335.0
b 2012-01-01 00:00:10 b 144.0
2012-01-01 00:00:12 b 460.0
2012-01-01 00:00:14 b 237.0
2012-01-01 00:00:16 b 281.0
2012-01-01 00:00:18 b 276.0
print (df.groupby('a').resample('2S').first().fillna(0))
a b
a
a 2012-01-01 00:00:00 a 37.0
2012-01-01 00:00:02 a 396.0
2012-01-01 00:00:04 a 0.0
2012-01-01 00:00:06 a 0.0
2012-01-01 00:00:08 a 335.0
b 2012-01-01 00:00:10 b 144.0
2012-01-01 00:00:12 b 460.0
2012-01-01 00:00:14 b 237.0
2012-01-01 00:00:16 b 281.0
2012-01-01 00:00:18 b 276.0
print (df.groupby('a').resample('2S').asfreq().fillna(0))
a b
a
a 2012-01-01 00:00:00 a 37.0
2012-01-01 00:00:02 a 396.0
2012-01-01 00:00:04 a 0.0
2012-01-01 00:00:06 a 0.0
2012-01-01 00:00:08 a 335.0
b 2012-01-01 00:00:10 b 144.0
2012-01-01 00:00:12 b 460.0
2012-01-01 00:00:14 b 237.0
2012-01-01 00:00:16 b 281.0
2012-01-01 00:00:18 b 276.0
If use replace
another values are aggregating as mean
:
print (df.groupby('a').resample('2S').mean())
b
a
a 2012-01-01 00:00:00 136.0
2012-01-01 00:00:02 396.0
2012-01-01 00:00:04 NaN
2012-01-01 00:00:06 NaN
2012-01-01 00:00:08 391.5
b 2012-01-01 00:00:10 136.5
2012-01-01 00:00:12 265.5
2012-01-01 00:00:14 313.5
2012-01-01 00:00:16 229.5
2012-01-01 00:00:18 265.0
print (df.groupby('a').resample('2S').mean().fillna(0))
b
a
a 2012-01-01 00:00:00 136.0
2012-01-01 00:00:02 396.0
2012-01-01 00:00:04 0.0
2012-01-01 00:00:06 0.0
2012-01-01 00:00:08 391.5
b 2012-01-01 00:00:10 136.5
2012-01-01 00:00:12 265.5
2012-01-01 00:00:14 313.5
2012-01-01 00:00:16 229.5
2012-01-01 00:00:18 265.0
print (df.groupby('a').resample('2S').replace(np.nan,0))
b
a
a 2012-01-01 00:00:00 136.0
2012-01-01 00:00:02 396.0
2012-01-01 00:00:04 0.0
2012-01-01 00:00:06 0.0
2012-01-01 00:00:08 391.5
b 2012-01-01 00:00:10 136.5
2012-01-01 00:00:12 265.5
2012-01-01 00:00:14 313.5
2012-01-01 00:00:16 229.5
2012-01-01 00:00:18 265.0
Upsampling:
Use asfreq
, it is same as replace
:
print (df.groupby('a').resample('200L').asfreq().fillna(0))
a b
a
a 2012-01-01 00:00:00.000 a 37.0
2012-01-01 00:00:00.200 0 0.0
2012-01-01 00:00:00.400 0 0.0
2012-01-01 00:00:00.600 0 0.0
2012-01-01 00:00:00.800 0 0.0
2012-01-01 00:00:01.000 a 235.0
2012-01-01 00:00:01.200 0 0.0
2012-01-01 00:00:01.400 0 0.0
2012-01-01 00:00:01.600 0 0.0
2012-01-01 00:00:01.800 0 0.0
2012-01-01 00:00:02.000 a 396.0
2012-01-01 00:00:02.200 0 0.0
2012-01-01 00:00:02.400 0 0.0
...
print (df.groupby('a').resample('200L').replace(np.nan,0))
b
a
a 2012-01-01 00:00:00.000 37.0
2012-01-01 00:00:00.200 0.0
2012-01-01 00:00:00.400 0.0
2012-01-01 00:00:00.600 0.0
2012-01-01 00:00:00.800 0.0
2012-01-01 00:00:01.000 235.0
2012-01-01 00:00:01.200 0.0
2012-01-01 00:00:01.400 0.0
2012-01-01 00:00:01.600 0.0
2012-01-01 00:00:01.800 0.0
2012-01-01 00:00:02.000 396.0
2012-01-01 00:00:02.200 0.0
2012-01-01 00:00:02.400 0.0
...
print ((df.groupby('a').resample('200L').replace(np.nan,0).b ==
df.groupby('a').resample('200L').asfreq().fillna(0).b).all())
True
Conclusion:
For downsampling use same aggregating function like sum
, first
or mean
and for upsampling asfreq
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With