Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python, Pandas: tz_localize AmbiguousTimeError: Cannot infer dst time with non DST dates

I'm trying to import some timeseries data and convert it into UTC so I can merge it with another dataset. This data seems to have 24 hour data and doesn't have DST adjustments. This post gives a similar answer, but they simply drop the line. I need to shift it so I can merge it with my other data.

When I run my code:

df = pd.read_csv('http://rredc.nrel.gov/solar/old_data/nsrdb/1991-2010/data/hourly/{}/{}_{}_solar.csv'.format(723898,723898,1998), usecols=["YYYY-MM-DD", "HH:MM (LST)","Meas Glo (Wh/m^2)","Meas Dir (Wh/m^2)","Meas Dif (Wh/m^2)"])

def clean_time(obj):
    hour = int(obj[0:-3])
    hour = str(hour - 1)
    if len(str(hour)) == 2:
        return hour+":00"
    else:
        return "0" + hour + ":00"

df['HH:MM (LST)'] = df['HH:MM (LST)'].apply(clean_time)
df['DateTime'] = df['YYYY-MM-DD'] + " " + df['HH:MM (LST)']
df = df.set_index(pd.DatetimeIndex(df['DateTime']))
df.drop(["YYYY-MM-DD", "HH:MM (LST)",'DateTime'],axis=1,inplace=True)
df.index = df.index.tz_localize('US/Pacific', ambiguous='infer')

I get:

pytz.exceptions.AmbiguousTimeError: Cannot infer dst time from 1998-10-25 01:00:00 as there are no repeated times

If I leave ambiguous='raise' (the default), it gives me:

pytz.exceptions.NonExistentTimeError: 1998-04-05 02:00:00

So I'm stuck on either the start, or end of daylight savings time.

There's quite a few of these datasets (multiple sites over multiple years) I need to merge, so I'd prefer not to hand code specific hours to shift, but I'm still a novice and can't quite figure out my next steps here.

Appreciate the help!

like image 210
Eric O Avatar asked Apr 21 '16 01:04

Eric O


1 Answers

Minimal reproduction scenario:

from datetime import datetime, timedelta

import pandas as pd


df = pd.DataFrame([[datetime(2019, 10, 27, 0) + timedelta(hours=i), i] for i in range(24)], columns=['dt', 'i']).set_index('dt')
df.index.tz_localize('Europe/Amsterdam', ambiguous='infer')

pytz.exceptions.AmbiguousTimeError: Cannot infer dst time from 2019-10-27 02:00:00 as there are no repeated times

Solution: manually specify which datetime objects must be considered DT (Daylight Time) or DST (Daylight Savings Time). See documentation.

from datetime import datetime, timedelta
import numpy as np
import pandas as pd


df = pd.DataFrame([[datetime(2019, 10, 27, 0) + timedelta(hours=i), i] for i in range(24)], columns=['dt', 'i']).set_index('dt')
infer_dst = np.array([False] * df.shape[0])  # all False -> every row considered DT, alternative is True to indicate DST. The array must correspond to the iloc of df.index
df.index.tz_localize('Europe/Amsterdam', ambiguous=infer_dst)  # no error
like image 159
Freek Wiekmeijer Avatar answered Sep 21 '22 09:09

Freek Wiekmeijer