Let's say I have a dataframe as:
| timestamp | value |
| ------------------- | ----- |
| 01/01/2013 00:00:00 | 2.1 |
| 01/01/2013 00:00:03 | 3.7 |
| 01/01/2013 00:00:05 | 2.4 |
I'd like to have the dataframe as:
| timestamp | value |
| ------------------- | ----- |
| 01/01/2013 00:00:00 | 2.1 |
| 01/01/2013 00:00:01 | 2.1 |
| 01/01/2013 00:00:02 | 2.1 |
| 01/01/2013 00:00:03 | 3.7 |
| 01/01/2013 00:00:04 | 3.7 |
| 01/01/2013 00:00:05 | 2.4 |
How do I go about this?
One way to impute missing values in a time series data is to fill them with either the last or the next observed values. Pandas have fillna() function which has method parameter where we can choose “ffill” to fill with the next observed value or “bfill” to fill with the previously observed value.
Therefore, by using pd. date_range(start date, end date). difference(Date), we get all the dates that are not present in our list of Dates.
One method for filling the missing values is a forward fill. With this approach, the value directly prior is used to fill the missing value. For example, the 2nd through 4th were missing in our data and will be filled with the value from the 1st (1.0).
You can use resample
with ffill
:
print (df.dtypes)
timestamp object
value float64
dtype: object
df['timestamp'] = pd.to_datetime(df['timestamp'])
print (df.dtypes)
timestamp datetime64[ns]
value float64
dtype: object
df = df.set_index('timestamp').resample('S').ffill()
print (df)
value
timestamp
2013-01-01 00:00:00 2.1
2013-01-01 00:00:01 2.1
2013-01-01 00:00:02 2.1
2013-01-01 00:00:03 3.7
2013-01-01 00:00:04 3.7
2013-01-01 00:00:05 2.4
df = df.set_index('timestamp').resample('S').ffill().reset_index()
print (df)
timestamp value
0 2013-01-01 00:00:00 2.1
1 2013-01-01 00:00:01 2.1
2 2013-01-01 00:00:02 2.1
3 2013-01-01 00:00:03 3.7
4 2013-01-01 00:00:04 3.7
5 2013-01-01 00:00:05 2.4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With