You can use the function tz_localize
to make a Timestamp or DateTimeIndex timezone aware, but how can you do the opposite: how can you convert a timezone aware Timestamp to a naive one, while preserving its timezone?
An example:
In [82]: t = pd.date_range(start="2013-05-18 12:00:00", periods=10, freq='s', tz="Europe/Brussels")
In [83]: t
Out[83]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-05-18 12:00:00, ..., 2013-05-18 12:00:09]
Length: 10, Freq: S, Timezone: Europe/Brussels
I could remove the timezone by setting it to None, but then the result is converted to UTC (12 o'clock became 10):
In [86]: t.tz = None
In [87]: t
Out[87]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-05-18 10:00:00, ..., 2013-05-18 10:00:09]
Length: 10, Freq: S, Timezone: None
Is there another way I can convert a DateTimeIndex to timezone naive, but while preserving the timezone it was set in?
Some context on the reason I am asking this: I want to work with timezone naive timeseries (to avoid the extra hassle with timezones, and I do not need them for the case I am working on).
But for some reason, I have to deal with a timezone-aware timeseries in my local timezone (Europe/Brussels). As all my other data are timezone naive (but represented in my local timezone), I want to convert this timeseries to naive to further work with it, but it also has to be represented in my local timezone (so just remove the timezone info, without converting the user-visible time to UTC).
I know the time is actually internal stored as UTC and only converted to another timezone when you represent it, so there has to be some kind of conversion when I want to "delocalize" it. For example, with the python datetime module you can "remove" the timezone like this:
In [119]: d = pd.Timestamp("2013-05-18 12:00:00", tz="Europe/Brussels")
In [120]: d
Out[120]: <Timestamp: 2013-05-18 12:00:00+0200 CEST, tz=Europe/Brussels>
In [121]: d.replace(tzinfo=None)
Out[121]: <Timestamp: 2013-05-18 12:00:00>
So, based on this, I could do the following, but I suppose this will not be very efficient when working with a larger timeseries:
In [124]: t
Out[124]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-05-18 12:00:00, ..., 2013-05-18 12:00:09]
Length: 10, Freq: S, Timezone: Europe/Brussels
In [125]: pd.DatetimeIndex([i.replace(tzinfo=None) for i in t])
Out[125]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-05-18 12:00:00, ..., 2013-05-18 12:00:09]
Length: 10, Freq: None, Timezone: None
To answer my own question, this functionality has been added to pandas in the meantime. Starting from pandas 0.15.0, you can use tz_localize(None)
to remove the timezone resulting in local time.
See the whatsnew entry: http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#timezone-handling-improvements
So with my example from above:
In [4]: t = pd.date_range(start="2013-05-18 12:00:00", periods=2, freq='H',
tz= "Europe/Brussels")
In [5]: t
Out[5]: DatetimeIndex(['2013-05-18 12:00:00+02:00', '2013-05-18 13:00:00+02:00'],
dtype='datetime64[ns, Europe/Brussels]', freq='H')
using tz_localize(None)
removes the timezone information resulting in naive local time:
In [6]: t.tz_localize(None)
Out[6]: DatetimeIndex(['2013-05-18 12:00:00', '2013-05-18 13:00:00'],
dtype='datetime64[ns]', freq='H')
Further, you can also use tz_convert(None)
to remove the timezone information but converting to UTC, so yielding naive UTC time:
In [7]: t.tz_convert(None)
Out[7]: DatetimeIndex(['2013-05-18 10:00:00', '2013-05-18 11:00:00'],
dtype='datetime64[ns]', freq='H')
This is much more performant than the datetime.replace
solution:
In [31]: t = pd.date_range(start="2013-05-18 12:00:00", periods=10000, freq='H',
tz="Europe/Brussels")
In [32]: %timeit t.tz_localize(None)
1000 loops, best of 3: 233 µs per loop
In [33]: %timeit pd.DatetimeIndex([i.replace(tzinfo=None) for i in t])
10 loops, best of 3: 99.7 ms per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With