I have some data as shown below with hour
in UTC. I want to create a new column named local_hour
based on time_zone
. How can I do that? It seems like pandas' tz_convert
does not allow a column or pandas series as input to the tz
argument.
# Create dataframe
import pandas as pd
df = pd.DataFrame({
'hour': ['2019-01-01 05:00:00', '2019-01-01 07:00:00', '2019-01-01 08:00:00'],
'time_zone': ['US/Eastern', 'US/Central', 'US/Mountain']
})
# Convert hour to datetime and localize to UTC
df['hour'] = pd.to_datetime(df['hour']).dt.tz_localize('UTC')
df
hour time_zone
0 2019-01-01 05:00:00+00:00 US/Eastern
1 2019-01-01 07:00:00+00:00 US/Central
2 2019-01-01 08:00:00+00:00 US/Mountain
# Create local_hour column to convert hour to US/Eastern time (this works)
df['local_hour'] = df['hour'].dt.tz_convert(tz='US/Eastern')
df
hour time_zone local_hour
0 2019-01-01 05:00:00+00:00 US/Eastern 2019-01-01 00:00:00-05:00
1 2019-01-01 07:00:00+00:00 US/Central 2019-01-01 02:00:00-05:00
2 2019-01-01 08:00:00+00:00 US/Mountain 2019-01-01 03:00:00-05:00
# Try to create local_hour column to convert hour based on time_zone column (this fails)
df['local_hour'] = df['hour'].dt.tz_convert(tz=df['time_zone'])
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
In Pandas, you can convert a column (string/object or integer type) to datetime using the to_datetime() and astype() methods.
Comparison between pandas timestamp objects is carried out using simple comparison operators: >, <,==,< = , >=. The difference can be calculated using a simple '–' operator. Given time can be converted to pandas timestamp using pandas. Timestamp() method.
Pandas has a built-in function called to_datetime()that converts date and time in string format to a DateTime object. As you can see, the 'date' column in the DataFrame is currently of a string-type object. Thus, to_datetime() converts the column to a series of the appropriate datetime64 dtype.
dt.tz_convert
expects a scalar value for its tz
param, not a list of timezone-like values. Use apply
, which is essentially a loop:
df['local_hour'] = df.apply(lambda row: row['hour'].tz_convert(row['time_zone']), axis=1)
You can use babel and datetime instead:
import pandas as pd
from datetime import datetime
from babel.dates import format_datetime,get_timezone
# Create dataframe
df = pd.DataFrame({
'hour': ['2019-01-01 05:00:00', '2019-01-01 07:00:00', '2019-01-01 08:00:00'],
'time_zone': ['US/Eastern', 'US/Central', 'US/Mountain']
})
# First: convert hour column items to datetime objects
df['hour']=df['hour'].map(lambda hh: datetime.strptime(hh, '%Y-%m-%d %H:%M:%S'))
# Second: Try to create local_hour column to convert hour based on time_zone column
df['local_hour']=df[['hour','time_zone']].apply(lambda x: format_datetime(x[0], "yyyy-MM-dd HH:mm:ssZZ",
tzinfo=get_timezone(x[1]), locale='en'),axis=1)
# Third: Convert hour to datetime and localize to UTC (this was your first step)
df['hour']=df['hour'].map(lambda hh: format_datetime(hh, "yyyy-MM-dd HH:mm:ssZZ",
tzinfo=get_timezone('UTC'), locale='en'))
df
hour time_zone local_hour
0 2019-01-01 05:00:00+0000 US/Eastern 2019-01-01 00:00:00-0500
1 2019-01-01 07:00:00+0000 US/Central 2019-01-01 01:00:00-0600
2 2019-01-01 08:00:00+0000 US/Mountain 2019-01-01 01:00:00-0700
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With