I have a dataframe with several columns and indexed by dates. I would like to pad missing values but only for the next x days. It means that a missing value will not be padded if its difference in index is more than x days with the previous non missing value in this column.
I did something with a loop but it is not very efficient. Is there a better and more elegant way of doing it ?
I precise that the dates in my index are not equally spaced so the limit argument will not work.
You can use pandas. DataFrame. fillna with the method='ffill' option. 'ffill' stands for 'forward fill' and will propagate last valid observation forward.
ffill() function is used to fill the missing value in the dataframe. 'ffill' stands for 'forward fill' and will propagate last valid observation forward. inplace : If True, fill in place.
The fillna() method replaces the NULL values with a specified value. The fillna() method returns a new DataFrame object unless the inplace parameter is set to True , in that case the fillna() method does the replacing in the original DataFrame instead.
You can use the limit
argument of fillna
:
df.fillna(method='ffill', limit=3) # ffill is equivalent to pad
The same argument is available for the ffill
, bfill
convenience functions.
limit : int
, defaultNone
Maximum size gap to forward or backward fill
If you're dates aren't evenly spaced, you can resample
(by day) first:
df.resample('D')
See also the missing data section of the docs.
This illustrates what I meant
In [20]: df = DataFrame(randn(10,2),columns=list('AB'),index=date_range('20130101',periods=3)+date_range('20130110',periods=3)+date_range('20130120',periods=4))
In [21]: df
Out[21]:
A B
2013-01-01 -0.176354 1.033962
2013-01-02 0.666911 -0.018723
2013-01-03 0.300097 1.552866
2013-01-10 0.581816 -1.188106
2013-01-11 -0.394817 -1.018765
2013-01-12 1.000461 -1.211131
2013-01-20 0.097940 1.225805
2013-01-21 -2.205975 -0.455641
2013-01-22 0.508865 -0.403321
2013-01-23 -0.726969 0.448002
In [22]: df.reindex(index=date_range('20130101','20130125')).fillna(limit=2,method='pad')
Out[22]:
A B
2013-01-01 -0.176354 1.033962
2013-01-02 0.666911 -0.018723
2013-01-03 0.300097 1.552866
2013-01-04 0.300097 1.552866
2013-01-05 0.300097 1.552866
2013-01-06 NaN NaN
2013-01-07 NaN NaN
2013-01-08 NaN NaN
2013-01-09 NaN NaN
2013-01-10 0.581816 -1.188106
2013-01-11 -0.394817 -1.018765
2013-01-12 1.000461 -1.211131
2013-01-13 1.000461 -1.211131
2013-01-14 1.000461 -1.211131
2013-01-15 NaN NaN
2013-01-16 NaN NaN
2013-01-17 NaN NaN
2013-01-18 NaN NaN
2013-01-19 NaN NaN
2013-01-20 0.097940 1.225805
2013-01-21 -2.205975 -0.455641
2013-01-22 0.508865 -0.403321
2013-01-23 -0.726969 0.448002
2013-01-24 -0.726969 0.448002
2013-01-25 -0.726969 0.448002
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With