The following transformation (ms -> datetime -> conver timezone) takes a long time to run (4 minutes), probably because I am working with a large dataframe:
for column in ['A', 'B', 'C', 'D', 'E']:
# Data comes in unix time (ms) so I need to convert it to datetime
df[column] = pd.to_datetime(df[column], unit='ms')
# Get times in EST
df[column] = df[column].apply(lambda x: x.tz_localize('UTC').tz_convert('US/Eastern'))
Is there any way to speed it up? Am I already using Pandas data structures and methods in the most efficient manner?
These are available as DatetimeIndex methods which will be much faster:
df[column] = pd.DatetimeIndex(df[column]).tz_localize('UTC').tz_convert('US/Eastern')
Note: In 0.15.0 you'll have access to these as Series dt accessor:
df[column] = df[column].dt.tz_localize('UTC').tz_convert('US/Eastern')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With