I have a dataframe with columns of Datetime, lat, lon, z. I am reading the data in from a csv file so setting the period for the datetimes do not work. The times are in 6 hour intervals but I want to linearly interpolate the data to hourly intervals.
Go from
'A' 'B' 'C' 'D'
0 2010-09-13 18:00:00 16.3 -78.5 1
1 2010-09-14 00:00:00 16.6 -79.8 6
2 2010-09-14 06:00:00 17.0 -81.1 12
To
'A' 'B' 'C' 'D'
1 2010-09-13 18:00:00 16.3 -78.5 1
2 2010-09-13 19:00:00 16.35 -78.7 2
3 2010-09-13 20:00:00 16.4 -78.9 3
4 2010-09-13 21:00:00 16.45 -79.1 4
5 2010-09-13 22:00:00 16.5 -79.3 5
....
I have tried using the interpolate command but there are no arguments for a new length of the dataframe.
df.interpolate(method='linear')
I was thinking that I could use .loc to include 5 rows of NANs between each line in the data frame and then use the interpolation function but that seems like a bad workaround.
Solution Using DatetimeIndex eliminates the association with the other columns if your initial column was not imported as datetime.
i = pd.DatetimeIndex(start=df['A'].min(), end=df['A'].max(), freq='H')
df = df.reindex(i).interpolate()
print(df)
Gives the correct answer.
i = pd.DatetimeIndex(start=df.index.min(), end=df.index.max(), freq='H')
df = df.reindex(i).interpolate()
print(df)
outputs
2010-09-13 18:00:00 16.300000 -78.500000
2010-09-13 19:00:00 16.350000 -78.716667
2010-09-13 20:00:00 16.400000 -78.933333
2010-09-13 21:00:00 16.450000 -79.150000
2010-09-13 22:00:00 16.500000 -79.366667
Create a new index with the desired frequency using DatetimeIndex
(docs).
reindex
(docs) with this new index. By default values for new indices will be np.nan
.
interpolate
(docs) to fill in these missing values. You can supply the method
kwarg to determine how interpolation is done.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With