Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fillna : how to pad values over the next x days

Tags:

pandas

I have a dataframe with several columns and indexed by dates. I would like to pad missing values but only for the next x days. It means that a missing value will not be padded if its difference in index is more than x days with the previous non missing value in this column.

I did something with a loop but it is not very efficient. Is there a better and more elegant way of doing it ?

I precise that the dates in my index are not equally spaced so the limit argument will not work.

like image 475
Maxi Avatar asked Jun 11 '13 06:06

Maxi


People also ask

How do I forward my NaN fill?

You can use pandas. DataFrame. fillna with the method='ffill' option. 'ffill' stands for 'forward fill' and will propagate last valid observation forward.

How do I forward NaN values in pandas?

ffill() function is used to fill the missing value in the dataframe. 'ffill' stands for 'forward fill' and will propagate last valid observation forward. inplace : If True, fill in place.

What does the Fillna () method do?

The fillna() method replaces the NULL values with a specified value. The fillna() method returns a new DataFrame object unless the inplace parameter is set to True , in that case the fillna() method does the replacing in the original DataFrame instead.


2 Answers

You can use the limit argument of fillna:

df.fillna(method='ffill', limit=3)  # ffill is equivalent to pad

The same argument is available for the ffill, bfill convenience functions.

limit : int, default None
       Maximum size gap to forward or backward fill

If you're dates aren't evenly spaced, you can resample (by day) first:

df.resample('D')

See also the missing data section of the docs.

like image 158
Andy Hayden Avatar answered Oct 17 '22 18:10

Andy Hayden


This illustrates what I meant

In [20]: df = DataFrame(randn(10,2),columns=list('AB'),index=date_range('20130101',periods=3)+date_range('20130110',periods=3)+date_range('20130120',periods=4))

In [21]: df
Out[21]: 
                   A         B
2013-01-01 -0.176354  1.033962
2013-01-02  0.666911 -0.018723
2013-01-03  0.300097  1.552866
2013-01-10  0.581816 -1.188106
2013-01-11 -0.394817 -1.018765
2013-01-12  1.000461 -1.211131
2013-01-20  0.097940  1.225805
2013-01-21 -2.205975 -0.455641
2013-01-22  0.508865 -0.403321
2013-01-23 -0.726969  0.448002

In [22]: df.reindex(index=date_range('20130101','20130125')).fillna(limit=2,method='pad')
Out[22]: 
                   A         B
2013-01-01 -0.176354  1.033962
2013-01-02  0.666911 -0.018723
2013-01-03  0.300097  1.552866
2013-01-04  0.300097  1.552866
2013-01-05  0.300097  1.552866
2013-01-06       NaN       NaN
2013-01-07       NaN       NaN
2013-01-08       NaN       NaN
2013-01-09       NaN       NaN
2013-01-10  0.581816 -1.188106
2013-01-11 -0.394817 -1.018765
2013-01-12  1.000461 -1.211131
2013-01-13  1.000461 -1.211131
2013-01-14  1.000461 -1.211131
2013-01-15       NaN       NaN
2013-01-16       NaN       NaN
2013-01-17       NaN       NaN
2013-01-18       NaN       NaN
2013-01-19       NaN       NaN
2013-01-20  0.097940  1.225805
2013-01-21 -2.205975 -0.455641
2013-01-22  0.508865 -0.403321
2013-01-23 -0.726969  0.448002
2013-01-24 -0.726969  0.448002
2013-01-25 -0.726969  0.448002
like image 36
Jeff Avatar answered Oct 17 '22 19:10

Jeff