I've got a column in a pandas DataFrame that contains timestamps with timezones. There are two different timezones present in this column, and I need to ensure that there's only one. Here's the output of the end of the column:
260003 2019-05-21 12:00:00-06:00
260004 2019-05-21 12:15:00-06:00
Name: timestamp, Length: 260005, dtype: object
For what it's worth, the timestamps vary between -06:00
and -07:00
, and have the following output:
datetime.datetime(2007, 10, 1, 1, 0, tzinfo=tzoffset(None, -21600))
for -06:00
datetime.datetime(2007, 11, 17, 5, 15, tzinfo=tzoffset(None, -25200))
for -07:00
I've been trying to use tz.localize and tz.convert, which have worked fine in the past, but I suppose that data has only ever had one timezone. E.g., if I do:
df['timestamp'].dt.tz_localize('MST', ambiguous='infer').dt.tz_convert('MST')
I get:
ValueError: Array must be all same time zone
During handling of the above exception, another exception occurred:
ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True
Is there a way to convert these to MST? Or any timezone, really? I guess I could break up the DataFrame by timezone (not 100% sure how, but I imagine it's possible) and act on chunks of it, but I figured I'd ask to see if there's a smarter solution out there. Thank you!
Use the datetime. astimezone() method to convert the datetime from one timezone to another. This method uses an instance of the datetime object and returns a new datetime of a given timezone.
tz_localize(None) method can be applied to the dataframe column to remove the timezone information.
If your DataFrame holds the DateTime in a string column in a specific format, you can convert it by using to_datetime() function as it accepts the format param to specify the format date & time.
I tried:
df = pd.DataFrame({'timestamp':['2019-05-21 12:00:00-06:00',
'2019-05-21 12:15:00-07:00']})
df['timestamp'] = pd.to_datetime(df.timestamp)
df.timestamp.dt.tz_localize('MST')
works fine and gives:
0 2019-05-21 18:00:00-07:00
1 2019-05-21 19:15:00-07:00
Name: timestamp, dtype: datetime64[ns, MST]
It is not what you expect?
Edit: Thanks to @G.Anderson's comment, I tried the different data with timezone-aware timestamps:
df = pd.DataFrame({'timestamp':[pd.to_datetime('2019-05-21 12:00:00').tz_localize('MST'),
pd.to_datetime('2019-05-21 12:15:00').tz_localize('EST')]})
then
df['timestamp'] = pd.to_datetime(df.timestamp)
did give the same error. Then I added utc=True
:
df.timestamp = pd.to_datetime(df.timestamp, utc=True)
# df.timestamp
# 0 2019-05-21 19:00:00+00:00
# 1 2019-05-21 17:15:00+00:00
# Name: timestamp, dtype: datetime64[ns, UTC]
df.timestamp.dt.tz_convert('MST')
works fine and gives:
0 2019-05-21 12:00:00-07:00
1 2019-05-21 10:15:00-07:00
Name: timestamp, dtype: datetime64[ns, MST]
# input data
import pandas as pd
series = pd.Series(data=
[pd.to_datetime('2019-01-01 00:00:00').tz_localize('MST'),
pd.to_datetime('2019-01-01 01:10:00').tz_localize('UTC')])
print(series)
gives
0 2019-01-01 00:00:00-07:00
1 2019-01-01 01:10:00+00:00
dtype: object
Indeed,
series.dt.tz_convert('MST')
gives "ValueError: Array must be all same time zone" and "ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True". So, it seems, you have to do it in a non-vectorized way:
new_series = pd.Series(index=series.index,
data=[x.tz_convert('MST') for x in series])
print(new_series)
gives
0 2019-01-01 00:00:00-07:00
1 2018-12-31 18:10:00-07:00
dtype: datetime64[ns, MST]
Edit: if @QuangHoang is right (i.e. "the option automatically localizes the timestamps to utc", double check it!) about the meaning of pd.to_datetime(..., utc=True)
, then the following solution will also work:
new_series = pd.to_datetime(series, utc=True).dt.tz_convert('MST')
print(new_series)
gives
0 2019-01-01 00:00:00-07:00
1 2018-12-31 18:10:00-07:00
dtype: datetime64[ns, MST]
Let's have a series a
with multiple different timezones.
We expect a.tz_convert()
or a.tz_localize()
work but they don't.
The solution is to use the apply
method.
Please see the examples below:
> a
0 2019-10-04 16:30:00+02:00
1 2019-10-07 16:00:00-04:00
2 2019-09-24 08:30:00-07:00
Name: localized, dtype: object
> a.iloc[0]
Timestamp('2019-10-04 16:30:00+0200', tz='Europe/Amsterdam')
> a.apply(lambda x: x.tz_convert('America/Los_Angeles'))
0 2019-10-04 07:30:00-07:00
1 2019-10-07 13:00:00-07:00
2 2019-09-24 08:30:00-07:00
Name: localized, dtype: datetime64[ns, America/Los_Angeles]
# Make it tz-naive, i.e. remove tz info, note you lose information here, you might want to store tz-info in another series before the conversion.
> a.apply(lambda x: x.tz_localize(None))
0 2019-10-04 16:30:00
1 2019-10-07 16:00:00
2 2019-09-24 08:30:00
Name: localized, dtype: datetime64[ns]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With