I have a dataframe with timezone aware index
>>> dfn.index
Out[1]:
DatetimeIndex(['2004-01-02 01:00:00+11:00', '2004-01-02 02:00:00+11:00',
'2004-01-02 03:00:00+11:00', '2004-01-02 04:00:00+11:00',
'2004-01-02 21:00:00+11:00', '2004-01-02 22:00:00+11:00'],
dtype='datetime64[ns]', freq='H', tz='Australia/Sydney')
I save it in csv, then read it as follows:
>>> dfn.to_csv('temp.csv')
>>> df= pd.read_csv('temp.csv', index_col=0 ,header=None )
>>> df.head()
Out[1]:
1
0
NaN 0.0000
2004-01-02 01:00:00+11:00 0.7519
2004-01-02 02:00:00+11:00 0.7520
2004-01-02 03:00:00+11:00 0.7515
2004-01-02 04:00:00+11:00 0.7502
The index is read as a string
>>> df.index[1]
Out[3]: '2004-01-02 01:00:00+11:00'
On converting to_datetime, it changes the time as it adds +11 to hours
>>> df.index = pd.to_datetime(df.index)
>>> df.index[1]
Out[6]: Timestamp('2004-01-01 14:00:00')
I can now subtract 11 hours from the index to fix it, but is there a better way to handle this?
I tried using the solution in answer here, but that slows down the code a lot.
Pandas DatetimeIndex.tz_localize () function localize tz-naive DatetimeIndex to tz-aware DatetimeIndex. This method takes a time zone (tz) naive DatetimeIndex object and makes this time zone aware.
Late contribution but just came across something similar in Python datetime and pandas give different timestamps for the same date. If you have timezone-aware datetime in pandas, technically, tz_localize (None) changes the POSIX timestamp (that is used internally) as if the local time from the timestamp was UTC.
Timezone-aware objects are Python DateTime or time objects that include timezone information. An aware object represents a specific moment in time that is not open to interpretation. Checking if an object is timezone aware or not: We can easily check if a datetime object is timezone-aware or not.
If you have timezone-aware datetime in pandas, technically, tz_localize (None) changes the POSIX timestamp (that is used internally) as if the local time from the timestamp was UTC. Local in this context means local in the specified timezone.
I think here is problem you need write and read header of file same way.
And for parse dates need parameter parse_dates
.
#write to file header
dfn.to_csv('temp.csv')
#no read header
df= pd.read_csv('temp.csv', index_col=0 ,header=None)
Solution1:
#no write header
dfn.to_csv('temp.csv', header=None)
#no read header
df= pd.read_csv('temp.csv', index_col=0 ,header=None, parse_dates=[0])
Solution2:
#write header
dfn.to_csv('temp.csv')
#read header
df= pd.read_csv('temp.csv', index_col=0, parse_dates=[0])
Unfortunately parse_date
convert dates to UTC
, so is necessary add timezones later:
df.index = df.index.tz_localize('UTC').tz_convert('Australia/Sydney')
print (df.index)
DatetimeIndex(['2004-01-02 01:00:00+11:00', '2004-01-02 02:00:00+11:00',
'2004-01-02 03:00:00+11:00', '2004-01-02 04:00:00+11:00',
'2004-01-02 05:00:00+11:00', '2004-01-02 06:00:00+11:00',
'2004-01-02 07:00:00+11:00', '2004-01-02 08:00:00+11:00',
'2004-01-02 09:00:00+11:00', '2004-01-02 10:00:00+11:00'],
dtype='datetime64[ns, Australia/Sydney]', name=0, freq=None)
Sample for test:
idx = pd.date_range('2004-01-02 01:00:00', periods=10, freq='H', tz='Australia/Sydney')
dfn = pd.DataFrame({'col':range(len(idx))}, index=idx)
print (dfn)
col
2004-01-02 01:00:00+11:00 0
2004-01-02 02:00:00+11:00 1
2004-01-02 03:00:00+11:00 2
2004-01-02 04:00:00+11:00 3
2004-01-02 05:00:00+11:00 4
2004-01-02 06:00:00+11:00 5
2004-01-02 07:00:00+11:00 6
2004-01-02 08:00:00+11:00 7
2004-01-02 09:00:00+11:00 8
2004-01-02 10:00:00+11:00 9
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With