The code is:
import numpy as np
import pandas as pd
dateparse = lambda x: pd.datetime.strptime(x,'%d %m %Y %H %M')
vento = pd.read_csv('dados_tpm.txt', header=0, delim_whitespace= True, parse_dates = [['Dia', 'Mes', 'Ano', 'Hora','Minuto']], index_col = False, date_parser = dateparse)
vento1 = vento.rename(columns={'Dia_Mes_Ano_Hora_Minuto': 'Data'})
vento0 = vento1.set_index('Data')
vento_time = pd.DataFrame({'Data':pd.date_range(start='2016-07-12 18:00:00',end='2017-02-28 21:00:00',freq='3H')})
vento_time0 = vento_time.set_index('Data')
vento_2 = pd.merge_asof(vento_time0,vento0, on='Index', tolerance=pd.Timedelta("5 minutes")).fillna('NAN')
Where vento0
is something like:
Index Vel Dir
2016-07-12 16:17:00 9.8 13.8
2016-07-12 16:18:00 10.9 1.8
2016-07-12 16:19:00 10.0 11.1
2016-07-12 16:20:00 11.0 11.0
... ... ...
... ... ...
2017-02-28 22:34:00 9.2 13.7
And vento_time0
seems like:
Index
2016-07-12 18:00:00
2016-07-12 21:00:00
2016-07-13 00:00:00
2016-07-13 03:00:00
... ...
... ...
2017-02-28 21:00:00
My data has an one minute interval and it's non regularised. The objective of this is to put it in a 3 hour interval replacing the missing values with the closests data in a range of five minutes. But when merge_asof
is used, this error appears: KeyError: 'Index'
. I also tried to use Data
, the actual name of indexes but get the same error. The expected output will be:
Index Vel Dir
2016-07-12 18:00:00 8.0 55
2016-07-12 21:00:00 16.0 67
2016-07-13 00:00:00 NAN NAN
2016-07-13 03:00:00 19.0 83
... ...
... ...
2017-02-28 21:00:00 NAN NAN
Can anyone help? Is there a way to use merge_asof function in the Index?
Do something like this:
Use .sort_values(by = 'Data')
instead .set_index
vento0 = vento1.sort_values(by = 'Data')
vento_time0 = vento_time.sort_values(by = 'Data')
After do that, this should work:
vento_2 = pd.merge_asof(vento_time0,vento0, \
tolerance=pd.Timedelta("5 minutes")).fillna('NAN')
Be sure that your 'NAN'
becomes "not a number" using:
vento_2.convert_objects(convert_numeric=True)
After use merge_asof
and convert your 'NAN'
you can set your index.
vento_2.set_index(['Data'], inplace=True)
If you're trying to merge_asof()
on indicies, you need to use the following:
vento_2 = pd.merge_asof(vento_time0,vento0, left_index = True, right_index = True, tolerance=pd.Timedelta("5 minutes")).fillna('NAN')
Docs here: http://pandas.pydata.org/pandas-docs/version/0.20/generated/pandas.merge_asof.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With