I am trying to interpolate time series data, df
, which looks like:
id data lat notes analysis_date
0 17358709 NaN 26.125979 None 2019-09-20 12:00:00+00:00
1 17358709 NaN 26.125979 None 2019-09-20 12:00:00+00:00
2 17352742 -2.331365 26.125979 None 2019-09-20 12:00:00+00:00
3 17358709 -4.424366 26.125979 None 2019-09-20 12:00:00+00:00
I try: df.groupby(['lat', 'lon']).apply(lambda group: group.interpolate(method='linear'))
, and it throws {ValueError}Invalid fill method. Expecting pad (ffill) or backfill (bfill). Got linear
I suspect the issue is with the fact that I have None
values, and I do not want to interpolate those. What is the solution?
df.dtypes
gives me:
id int64
data float64
lat float64
notes object
analysis_date datetime64[ns, psycopg2.tz.FixedOffsetTimezone...
dtype: object
You can interpolate missing values ( NaN ) in pandas. DataFrame and Series with interpolate() . This article describes the following contents. Use dropna() and fillna() to remove missing values NaN or to fill them with a specific value.
DataFrame.interpolate
has issues with timezone-aware datetime64ns columns, which leads to that rather cryptic error message. E.g.
import pandas as pd
df = pd.DataFrame({'time': pd.to_datetime(['2010', '2011', 'foo', '2012', '2013'],
errors='coerce')})
df['time'] = df.time.dt.tz_localize('UTC').dt.tz_convert('Asia/Kolkata')
df.interpolate()
ValueError: Invalid fill method. Expecting pad (ffill) or backfill (bfill). Got linear
In this case interpolating that column is unnecessary so only interpolate the column you need. We still want DataFrame.interpolate
so select with [[ ]]
(Series.interpolate
leads to some odd reshaping)
df['data'] = df.groupby(['lat', 'lon']).apply(lambda x: x[['data']].interpolate())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With