I'm trying to import some timeseries data and convert it into UTC so I can merge it with another dataset. This data seems to have 24 hour data and doesn't have DST adjustments. This post gives a similar answer, but they simply drop the line. I need to shift it so I can merge it with my other data.
When I run my code:
df = pd.read_csv('http://rredc.nrel.gov/solar/old_data/nsrdb/1991-2010/data/hourly/{}/{}_{}_solar.csv'.format(723898,723898,1998), usecols=["YYYY-MM-DD", "HH:MM (LST)","Meas Glo (Wh/m^2)","Meas Dir (Wh/m^2)","Meas Dif (Wh/m^2)"])
def clean_time(obj):
hour = int(obj[0:-3])
hour = str(hour - 1)
if len(str(hour)) == 2:
return hour+":00"
else:
return "0" + hour + ":00"
df['HH:MM (LST)'] = df['HH:MM (LST)'].apply(clean_time)
df['DateTime'] = df['YYYY-MM-DD'] + " " + df['HH:MM (LST)']
df = df.set_index(pd.DatetimeIndex(df['DateTime']))
df.drop(["YYYY-MM-DD", "HH:MM (LST)",'DateTime'],axis=1,inplace=True)
df.index = df.index.tz_localize('US/Pacific', ambiguous='infer')
I get:
pytz.exceptions.AmbiguousTimeError: Cannot infer dst time from 1998-10-25 01:00:00 as there are no repeated times
If I leave ambiguous='raise' (the default), it gives me:
pytz.exceptions.NonExistentTimeError: 1998-04-05 02:00:00
So I'm stuck on either the start, or end of daylight savings time.
There's quite a few of these datasets (multiple sites over multiple years) I need to merge, so I'd prefer not to hand code specific hours to shift, but I'm still a novice and can't quite figure out my next steps here.
Appreciate the help!
Minimal reproduction scenario:
from datetime import datetime, timedelta
import pandas as pd
df = pd.DataFrame([[datetime(2019, 10, 27, 0) + timedelta(hours=i), i] for i in range(24)], columns=['dt', 'i']).set_index('dt')
df.index.tz_localize('Europe/Amsterdam', ambiguous='infer')
pytz.exceptions.AmbiguousTimeError: Cannot infer dst time from 2019-10-27 02:00:00 as there are no repeated times
Solution: manually specify which datetime objects must be considered DT (Daylight Time) or DST (Daylight Savings Time). See documentation.
from datetime import datetime, timedelta
import numpy as np
import pandas as pd
df = pd.DataFrame([[datetime(2019, 10, 27, 0) + timedelta(hours=i), i] for i in range(24)], columns=['dt', 'i']).set_index('dt')
infer_dst = np.array([False] * df.shape[0]) # all False -> every row considered DT, alternative is True to indicate DST. The array must correspond to the iloc of df.index
df.index.tz_localize('Europe/Amsterdam', ambiguous=infer_dst) # no error
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With