Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speeding up timestamp operations

Tags:

python

pandas

The following transformation (ms -> datetime -> conver timezone) takes a long time to run (4 minutes), probably because I am working with a large dataframe:

for column in ['A', 'B', 'C', 'D', 'E']:
    # Data comes in unix time (ms) so I need to convert it to datetime
    df[column] = pd.to_datetime(df[column], unit='ms')

    # Get times in EST
    df[column] = df[column].apply(lambda x: x.tz_localize('UTC').tz_convert('US/Eastern'))

Is there any way to speed it up? Am I already using Pandas data structures and methods in the most efficient manner?

like image 212
Amelio Vazquez-Reina Avatar asked Mar 19 '23 13:03

Amelio Vazquez-Reina


1 Answers

These are available as DatetimeIndex methods which will be much faster:

df[column] = pd.DatetimeIndex(df[column]).tz_localize('UTC').tz_convert('US/Eastern')

Note: In 0.15.0 you'll have access to these as Series dt accessor:

df[column] = df[column].dt.tz_localize('UTC').tz_convert('US/Eastern')
like image 186
Andy Hayden Avatar answered Mar 27 '23 16:03

Andy Hayden