Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert pandas column with multiple timezones to single timezone

Problem

I've got a column in a pandas DataFrame that contains timestamps with timezones. There are two different timezones present in this column, and I need to ensure that there's only one. Here's the output of the end of the column:

260003    2019-05-21 12:00:00-06:00
260004    2019-05-21 12:15:00-06:00
Name: timestamp, Length: 260005, dtype: object

For what it's worth, the timestamps vary between -06:00 and -07:00, and have the following output:

datetime.datetime(2007, 10, 1, 1, 0, tzinfo=tzoffset(None, -21600)) for -06:00 datetime.datetime(2007, 11, 17, 5, 15, tzinfo=tzoffset(None, -25200)) for -07:00

What I've Done

I've been trying to use tz.localize and tz.convert, which have worked fine in the past, but I suppose that data has only ever had one timezone. E.g., if I do:

df['timestamp'].dt.tz_localize('MST', ambiguous='infer').dt.tz_convert('MST')

I get:

ValueError: Array must be all same time zone

During handling of the above exception, another exception occurred:

ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True

Question

Is there a way to convert these to MST? Or any timezone, really? I guess I could break up the DataFrame by timezone (not 100% sure how, but I imagine it's possible) and act on chunks of it, but I figured I'd ask to see if there's a smarter solution out there. Thank you!

like image 684
erekalper Avatar asked May 21 '19 19:05

erekalper


People also ask

How do I convert between time zones in Python?

Use the datetime. astimezone() method to convert the datetime from one timezone to another. This method uses an instance of the datetime object and returns a new datetime of a given timezone.

How do I remove the timezone from a datetime column in Python?

tz_localize(None) method can be applied to the dataframe column to remove the timezone information.

How do you convert a column into a timeframe?

If your DataFrame holds the DateTime in a string column in a specific format, you can convert it by using to_datetime() function as it accepts the format param to specify the format date & time.


Video Answer


3 Answers

I tried:

df = pd.DataFrame({'timestamp':['2019-05-21 12:00:00-06:00',
                                '2019-05-21 12:15:00-07:00']})
df['timestamp'] = pd.to_datetime(df.timestamp)

df.timestamp.dt.tz_localize('MST')

works fine and gives:

0   2019-05-21 18:00:00-07:00
1   2019-05-21 19:15:00-07:00
Name: timestamp, dtype: datetime64[ns, MST]

It is not what you expect?


Edit: Thanks to @G.Anderson's comment, I tried the different data with timezone-aware timestamps:

df = pd.DataFrame({'timestamp':[pd.to_datetime('2019-05-21 12:00:00').tz_localize('MST'),
                         pd.to_datetime('2019-05-21 12:15:00').tz_localize('EST')]})

then

df['timestamp'] = pd.to_datetime(df.timestamp)

did give the same error. Then I added utc=True:

df.timestamp = pd.to_datetime(df.timestamp, utc=True)

# df.timestamp
# 0   2019-05-21 19:00:00+00:00
# 1   2019-05-21 17:15:00+00:00
# Name: timestamp, dtype: datetime64[ns, UTC]

df.timestamp.dt.tz_convert('MST')

works fine and gives:

0   2019-05-21 12:00:00-07:00
1   2019-05-21 10:15:00-07:00
Name: timestamp, dtype: datetime64[ns, MST]
like image 172
Quang Hoang Avatar answered Oct 12 '22 10:10

Quang Hoang


# input data
import pandas as pd
series = pd.Series(data=
    [pd.to_datetime('2019-01-01 00:00:00').tz_localize('MST'),
     pd.to_datetime('2019-01-01 01:10:00').tz_localize('UTC')])
print(series)

gives

0    2019-01-01 00:00:00-07:00
1    2019-01-01 01:10:00+00:00
dtype: object

Indeed,

series.dt.tz_convert('MST')

gives "ValueError: Array must be all same time zone" and "ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True". So, it seems, you have to do it in a non-vectorized way:

new_series = pd.Series(index=series.index,
    data=[x.tz_convert('MST') for x in series])
print(new_series)

gives

0   2019-01-01 00:00:00-07:00
1   2018-12-31 18:10:00-07:00
dtype: datetime64[ns, MST]

Edit: if @QuangHoang is right (i.e. "the option automatically localizes the timestamps to utc", double check it!) about the meaning of pd.to_datetime(..., utc=True), then the following solution will also work:

new_series = pd.to_datetime(series, utc=True).dt.tz_convert('MST')
print(new_series)

gives

0   2019-01-01 00:00:00-07:00
1   2018-12-31 18:10:00-07:00
dtype: datetime64[ns, MST]
like image 45
S.V Avatar answered Oct 12 '22 11:10

S.V


Let's have a series a with multiple different timezones. We expect a.tz_convert() or a.tz_localize() work but they don't. The solution is to use the apply method. Please see the examples below:

> a
0    2019-10-04 16:30:00+02:00
1    2019-10-07 16:00:00-04:00
2    2019-09-24 08:30:00-07:00
Name: localized, dtype: object

> a.iloc[0]
Timestamp('2019-10-04 16:30:00+0200', tz='Europe/Amsterdam')

> a.apply(lambda x: x.tz_convert('America/Los_Angeles'))
0   2019-10-04 07:30:00-07:00
1   2019-10-07 13:00:00-07:00
2   2019-09-24 08:30:00-07:00
Name: localized, dtype: datetime64[ns, America/Los_Angeles]

# Make it tz-naive, i.e. remove tz info, note you lose information here, you might want to store tz-info in another series before the conversion.
> a.apply(lambda x: x.tz_localize(None))
0   2019-10-04 16:30:00
1   2019-10-07 16:00:00
2   2019-09-24 08:30:00
Name: localized, dtype: datetime64[ns]
like image 39
tozCSS Avatar answered Oct 12 '22 11:10

tozCSS