Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python:Fill in missing datetime values in dataframe and fill forward?

Let's say I have a dataframe as:

|       timestamp     | value |
| ------------------- | ----- |
| 01/01/2013 00:00:00 |  2.1  |
| 01/01/2013 00:00:03 |  3.7  |
| 01/01/2013 00:00:05 |  2.4  |

I'd like to have the dataframe as:

|       timestamp     | value |
| ------------------- | ----- |
| 01/01/2013 00:00:00 |  2.1  |
| 01/01/2013 00:00:01 |  2.1  |
| 01/01/2013 00:00:02 |  2.1  |
| 01/01/2013 00:00:03 |  3.7  |
| 01/01/2013 00:00:04 |  3.7  |
| 01/01/2013 00:00:05 |  2.4  |

How do I go about this?

like image 586
aswa09 Avatar asked May 04 '17 08:05

aswa09


People also ask

How do you fill missing values in a time series Python?

One way to impute missing values in a time series data is to fill them with either the last or the next observed values. Pandas have fillna() function which has method parameter where we can choose “ffill” to fill with the next observed value or “bfill” to fill with the previously observed value.

How do I fill missing date in pandas?

Therefore, by using pd. date_range(start date, end date). difference(Date), we get all the dates that are not present in our list of Dates.

How do you fill a missing date in time series?

One method for filling the missing values is a forward fill. With this approach, the value directly prior is used to fill the missing value. For example, the 2nd through 4th were missing in our data and will be filled with the value from the 1st (1.0).


1 Answers

You can use resample with ffill:

print (df.dtypes)
timestamp     object
value        float64
dtype: object

df['timestamp'] = pd.to_datetime(df['timestamp'])

print (df.dtypes)
timestamp    datetime64[ns]
value               float64
dtype: object

df = df.set_index('timestamp').resample('S').ffill()
print (df)
                     value
timestamp                 
2013-01-01 00:00:00    2.1
2013-01-01 00:00:01    2.1
2013-01-01 00:00:02    2.1
2013-01-01 00:00:03    3.7
2013-01-01 00:00:04    3.7
2013-01-01 00:00:05    2.4

df = df.set_index('timestamp').resample('S').ffill().reset_index()
print (df)
            timestamp  value
0 2013-01-01 00:00:00    2.1
1 2013-01-01 00:00:01    2.1
2 2013-01-01 00:00:02    2.1
3 2013-01-01 00:00:03    3.7
4 2013-01-01 00:00:04    3.7
5 2013-01-01 00:00:05    2.4
like image 199
jezrael Avatar answered Sep 24 '22 12:09

jezrael