pandas to_Datetime conversion with timezone aware index

Tags:

I have a dataframe with timezone aware index

>>> dfn.index
Out[1]: 
DatetimeIndex(['2004-01-02 01:00:00+11:00', '2004-01-02 02:00:00+11:00',
               '2004-01-02 03:00:00+11:00', '2004-01-02 04:00:00+11:00',
               '2004-01-02 21:00:00+11:00', '2004-01-02 22:00:00+11:00'],
              dtype='datetime64[ns]', freq='H', tz='Australia/Sydney')

I save it in csv, then read it as follows:

>>> dfn.to_csv('temp.csv')
>>> df= pd.read_csv('temp.csv', index_col=0 ,header=None )
>>> df.head()
Out[1]: 
                                1
0                                
NaN                        0.0000
2004-01-02 01:00:00+11:00  0.7519
2004-01-02 02:00:00+11:00  0.7520
2004-01-02 03:00:00+11:00  0.7515
2004-01-02 04:00:00+11:00  0.7502

The index is read as a string

>>> df.index[1]
Out[3]: '2004-01-02 01:00:00+11:00'

On converting to_datetime, it changes the time as it adds +11 to hours

>>> df.index = pd.to_datetime(df.index)
>>> df.index[1]
Out[6]: Timestamp('2004-01-01 14:00:00')

I can now subtract 11 hours from the index to fix it, but is there a better way to handle this?

I tried using the solution in answer here, but that slows down the code a lot.

799

asked Nov 09 '17 06:11

dayum

1 Answers

I think here is problem you need write and read header of file same way. And for parse dates need parameter parse_dates.

#write to file header
dfn.to_csv('temp.csv')
#no read header
df= pd.read_csv('temp.csv', index_col=0 ,header=None)

Solution1:

#no write header
dfn.to_csv('temp.csv', header=None)
#no read header
df= pd.read_csv('temp.csv', index_col=0 ,header=None, parse_dates=[0])

Solution2:

#write header
dfn.to_csv('temp.csv')
#read header
df= pd.read_csv('temp.csv', index_col=0, parse_dates=[0])

Unfortunately parse_date convert dates to UTC, so is necessary add timezones later:

df.index = df.index.tz_localize('UTC').tz_convert('Australia/Sydney')
print (df.index)
DatetimeIndex(['2004-01-02 01:00:00+11:00', '2004-01-02 02:00:00+11:00',
               '2004-01-02 03:00:00+11:00', '2004-01-02 04:00:00+11:00',
               '2004-01-02 05:00:00+11:00', '2004-01-02 06:00:00+11:00',
               '2004-01-02 07:00:00+11:00', '2004-01-02 08:00:00+11:00',
               '2004-01-02 09:00:00+11:00', '2004-01-02 10:00:00+11:00'],
              dtype='datetime64[ns, Australia/Sydney]', name=0, freq=None)

Sample for test:

idx = pd.date_range('2004-01-02 01:00:00', periods=10, freq='H', tz='Australia/Sydney')
dfn = pd.DataFrame({'col':range(len(idx))}, index=idx)
print (dfn)
                           col
2004-01-02 01:00:00+11:00    0
2004-01-02 02:00:00+11:00    1
2004-01-02 03:00:00+11:00    2
2004-01-02 04:00:00+11:00    3
2004-01-02 05:00:00+11:00    4
2004-01-02 06:00:00+11:00    5
2004-01-02 07:00:00+11:00    6
2004-01-02 08:00:00+11:00    7
2004-01-02 09:00:00+11:00    8
2004-01-02 10:00:00+11:00    9

127

answered Oct 16 '22 18:10

jezrael

Related questions
                            
                                Calculating XIRR in Python
                            
                                Python Selenium. How to use driver.set_page_load_timeout() properly?
                            
                                How i can easily extract data from historian with python?
                            
                                Subclassing Sequence with proper type hints in Python
                            
                                Django. Listing files from a static folder
                            
                                Class method takes 1 positional argument but 2 were given
                            
                                Move data from Postgres/MySQL to S3 using Airflow
                            
                                Boolean values to column names in one list, dataframe pandas python
                            
                                Adding keys to dicts within a list, from values in a list
                            
                                User defined function on pandas dataframe
                            
                                How do I format all the cells in an excel to a single style using openpyxl?
                            
                                Identify if there are two of the same character adjacent to eachother
                            
                                How to get a list of modules imported by a python module
                            
                                Why are some python variables uppercase whereas others are lowercase?
                            
                                Testing if a certain number is within a list of ranges
                            
                                NLTK words vs word_tokenize
                            
                                Python/socket: How to send a file to another computer which is on a different network?
                            
                                Scrapy file download how to use custom filename
                            
                                Python: point on a line closest to third point
                            
                                pySerial AttributeError: module 'serial' has no attribute 'Serial'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pandas to_Datetime conversion with timezone aware index

Tags:

python

datetime

pandas

timestamp-with-timezone

dayum

People also ask

1 Answers

jezrael

Recent Activity

Donate For Us