Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read datetime with timezone in pandas

Tags:

I am trying to create a dataframe from csv, and its first column is like

"2013-08-25T00:00:00-0400";
"2013-08-25T01:00:00-0400";
"2013-08-25T02:00:00-0400";
"2013-08-25T03:00:00-0400";
"2013-08-25T04:00:00-0400";

It's datetime with timezone ! I already used something like

df1 = DataFrame(pd.read_csv(PeriodC, sep=';', parse_dates=[0], index_col=0))

but the result was

2013-09-02 04:00:00                                                                                    
2013-09-03 04:00:00                                                                                     
2013-09-04 04:00:00                                                                                     
2013-09-05 04:00:00                                                                                      
2013-09-06 04:00:00                                                                                     
2013-09-07 04:00:00                                                                                     
2013-09-08 04:00:00

Can anyone explain me how to seperate the datetime from timezone ?

like image 525
palas Avatar asked Sep 20 '13 07:09

palas


People also ask

How do you read time zones by datetime?

A time zone offset of "+hh:mm" indicates that the date/time uses a local time zone which is "hh" hours and "mm" minutes ahead of UTC. A time zone offset of "-hh:mm" indicates that the date/time uses a local time zone which is "hh" hours and "mm" minutes behind UTC.

Does Python datetime have timezone?

One of the modules that helps you work with date and time in Python is datetime . With the datetime module, you can get the current date and time, or the current date and time in a particular time zone.


1 Answers

Pandas parser will take into account the timezone information if it's available, and give you a naive Timestamp (naive == no timezone info), but with the timezone offset taken into account.

To keep the timezone information in you DataFrame you should first localize the Timestamps as UTC and then convert them to their timezone (which in this case is Etc/GMT+4):

>>> df = pd.read_csv(PeriodC, sep=';', parse_dates=[0], index_col=0)
>>> df.index[0]
>>> Timestamp('2013-08-25 04:00:00', tz=None)
>>> df.index = df.index.tz_localize('UTC').tz_convert('Etc/GMT+4')
>>> df.index[0]
Timestamp('2013-08-25 00:00:00-0400', tz='Etc/GMT+4')

If you want to completely discard the timezone information, then just specify a date_parser that will split the string and pass only the datetime portion to the parser.

>>> df = pd.read_csv(file, sep=';', parse_dates=[0], index_col=[0]
                     date_parser=lambda x: pd.to_datetime(x.rpartition('-')[0]))
>>> df.index[0]
Timestamp('2013-08-25 00:00:00', tz=None)
like image 146
Viktor Kerkez Avatar answered Sep 28 '22 02:09

Viktor Kerkez