Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas replace values in dataframe timeseries

I have a pandas dataframe df with pandas.tseries.index.DatetimeIndex as index.

The data is like this:

Time                 Open  High Low   Close Volume
2007-04-01 21:02:00 1.968 2.389 1.968 2.389 18.300000
2007-04-01 21:03:00 157.140 157.140 157.140 157.140 2.400000

....

I want to replace one datapoint, lets day 2.389 in column Close with NaN:

In: df["Close"].replace(2.389, np.nan)
Out: 2007-04-01 21:02:00      2.389
     2007-04-01 21:03:00    157.140

Replace did not change 2.389 to NaN. Whats wrong?

like image 953
harbun Avatar asked Jan 16 '15 19:01

harbun


1 Answers

replace might not work with floats because the floating point representation you see in the repr of the DataFrame might not be the same as the underlying float. For example, the actual Close value might be:

In [141]: df = pd.DataFrame({'Close': [2.389000000001]})

yet the repr of df looks like:

In [142]: df
Out[142]: 
   Close
0  2.389

So instead of checking for float equality, it is usually better to check for closeness:

In [150]: import numpy as np
In [151]: mask = np.isclose(df['Close'], 2.389)

In [152]: mask
Out[152]: array([ True], dtype=bool)

You can then use the boolean mask to select and change the desired values:

In [145]: df.loc[mask, 'Close'] = np.nan

In [146]: df
Out[146]: 
   Close
0    NaN
like image 57
unutbu Avatar answered Sep 27 '22 16:09

unutbu