Convert pandas timezone-aware DateTimeIndex to naive timestamp, but in certain timezone

People also ask

How do I remove a timestamp timezone?

tz_localize(None) method can be applied to the dataframe column to remove the timezone information. The output similar to the above example reflects that after manipulation, the UTC timezone information is no longer present in the timestamp column.

How do I remove Tzinfo from datetime?

To remove timestamp, tzinfo has to be set None when calling replace() function. First, create a DateTime object with current time using datetime. now(). The DateTime object was then modified to contain the timezone information as well using the timezone.

How do I make datetime timezone aware?

Timezone aware object using datetime now(). time() function of datetime module. Then we will replace the value of the timezone in the tzinfo class of the object using the replace() function. After that convert the date value into ISO 8601 format using the isoformat() method.

To answer my own question, this functionality has been added to pandas in the meantime. Starting from pandas 0.15.0, you can use tz_localize(None) to remove the timezone resulting in local time.
See the whatsnew entry: http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#timezone-handling-improvements

So with my example from above:

In [4]: t = pd.date_range(start="2013-05-18 12:00:00", periods=2, freq='H',
                          tz= "Europe/Brussels")

In [5]: t
Out[5]: DatetimeIndex(['2013-05-18 12:00:00+02:00', '2013-05-18 13:00:00+02:00'],
                       dtype='datetime64[ns, Europe/Brussels]', freq='H')

using tz_localize(None) removes the timezone information resulting in naive local time:

In [6]: t.tz_localize(None)
Out[6]: DatetimeIndex(['2013-05-18 12:00:00', '2013-05-18 13:00:00'], 
                      dtype='datetime64[ns]', freq='H')

Further, you can also use tz_convert(None) to remove the timezone information but converting to UTC, so yielding naive UTC time:

In [7]: t.tz_convert(None)
Out[7]: DatetimeIndex(['2013-05-18 10:00:00', '2013-05-18 11:00:00'], 
                      dtype='datetime64[ns]', freq='H')

This is much more performant than the datetime.replace solution:

In [31]: t = pd.date_range(start="2013-05-18 12:00:00", periods=10000, freq='H',
                           tz="Europe/Brussels")

In [32]: %timeit t.tz_localize(None)
1000 loops, best of 3: 233 µs per loop

In [33]: %timeit pd.DatetimeIndex([i.replace(tzinfo=None) for i in t])
10 loops, best of 3: 99.7 ms per loop

Because I always struggle to remember, a quick summary of what each of these do:

>>> pd.Timestamp.now()  # naive local time
Timestamp('2019-10-07 10:30:19.428748')

>>> pd.Timestamp.utcnow()  # tz aware UTC
Timestamp('2019-10-07 08:30:19.428748+0000', tz='UTC')

>>> pd.Timestamp.now(tz='Europe/Brussels')  # tz aware local time
Timestamp('2019-10-07 10:30:19.428748+0200', tz='Europe/Brussels')

>>> pd.Timestamp.now(tz='Europe/Brussels').tz_localize(None)  # naive local time
Timestamp('2019-10-07 10:30:19.428748')

>>> pd.Timestamp.now(tz='Europe/Brussels').tz_convert(None)  # naive UTC
Timestamp('2019-10-07 08:30:19.428748')

>>> pd.Timestamp.utcnow().tz_localize(None)  # naive UTC
Timestamp('2019-10-07 08:30:19.428748')

>>> pd.Timestamp.utcnow().tz_convert(None)  # naive UTC
Timestamp('2019-10-07 08:30:19.428748')

I think you can't achieve what you want in a more efficient manner than you proposed.

The underlying problem is that the timestamps (as you seem aware) are made up of two parts. The data that represents the UTC time, and the timezone, tz_info. The timezone information is used only for display purposes when printing the timezone to the screen. At display time, the data is offset appropriately and +01:00 (or similar) is added to the string. Stripping off the tz_info value (using tz_convert(tz=None)) doesn't doesn't actually change the data that represents the naive part of the timestamp.

So, the only way to do what you want is to modify the underlying data (pandas doesn't allow this... DatetimeIndex are immutable -- see the help on DatetimeIndex), or to create a new set of timestamp objects and wrap them in a new DatetimeIndex. Your solution does the latter:

pd.DatetimeIndex([i.replace(tzinfo=None) for i in t])

For reference, here is the replace method of Timestamp (see tslib.pyx):

def replace(self, **kwds):
    return Timestamp(datetime.replace(self, **kwds),
                     offset=self.offset)

You can refer to the docs on datetime.datetime to see that datetime.datetime.replace also creates a new object.

If you can, your best bet for efficiency is to modify the source of the data so that it (incorrectly) reports the timestamps without their timezone. You mentioned:

I want to work with timezone naive timeseries (to avoid the extra hassle with timezones, and I do not need them for the case I am working on)

I'd be curious what extra hassle you are referring to. I recommend as a general rule for all software development, keep your timestamp 'naive values' in UTC. There is little worse than looking at two different int64 values wondering which timezone they belong to. If you always, always, always use UTC for the internal storage, then you will avoid countless headaches. My mantra is Timezones are for human I/O only.

The accepted solution does not work when there are multiple different timezones in a Series. It throws ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True

The solution is to use the apply method.

Please see the examples below:

# Let's have a series `a` with different multiple timezones. 
> a
0    2019-10-04 16:30:00+02:00
1    2019-10-07 16:00:00-04:00
2    2019-09-24 08:30:00-07:00
Name: localized, dtype: object

> a.iloc[0]
Timestamp('2019-10-04 16:30:00+0200', tz='Europe/Amsterdam')

# trying the accepted solution
> a.dt.tz_localize(None)
ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True

# Make it tz-naive. This is the solution:
> a.apply(lambda x:x.tz_localize(None))
0   2019-10-04 16:30:00
1   2019-10-07 16:00:00
2   2019-09-24 08:30:00
Name: localized, dtype: datetime64[ns]

# a.tz_convert() also does not work with multiple timezones, but this works:
> a.apply(lambda x:x.tz_convert('America/Los_Angeles'))
0   2019-10-04 07:30:00-07:00
1   2019-10-07 13:00:00-07:00
2   2019-09-24 08:30:00-07:00
Name: localized, dtype: datetime64[ns, America/Los_Angeles]

Setting the tz attribute of the index explicitly seems to work:

ts_utc = ts.tz_convert("UTC")
ts_utc.index.tz = None

Building on D.A.'s suggestion that "the only way to do what you want is to modify the underlying data" and using numpy to modify the underlying data...

This works for me, and is pretty fast:

def tz_to_naive(datetime_index):
    """Converts a tz-aware DatetimeIndex into a tz-naive DatetimeIndex,
    effectively baking the timezone into the internal representation.

    Parameters
    ----------
    datetime_index : pandas.DatetimeIndex, tz-aware

    Returns
    -------
    pandas.DatetimeIndex, tz-naive
    """
    # Calculate timezone offset relative to UTC
    timestamp = datetime_index[0]
    tz_offset = (timestamp.replace(tzinfo=None) - 
                 timestamp.tz_convert('UTC').replace(tzinfo=None))
    tz_offset_td64 = np.timedelta64(tz_offset)

    # Now convert to naive DatetimeIndex
    return pd.DatetimeIndex(datetime_index.values + tz_offset_td64)

Late contribution but just came across something similar in Python datetime and pandas give different timestamps for the same date.

If you have timezone-aware datetime in pandas, technically, tz_localize(None) changes the POSIX timestamp (that is used internally) as if the local time from the timestamp was UTC. Local in this context means local in the specified timezone. Ex:

import pandas as pd

t = pd.date_range(start="2013-05-18 12:00:00", periods=2, freq='H', tz="US/Central")
# DatetimeIndex(['2013-05-18 12:00:00-05:00', '2013-05-18 13:00:00-05:00'], dtype='datetime64[ns, US/Central]', freq='H')

t_loc = t.tz_localize(None)
# DatetimeIndex(['2013-05-18 12:00:00', '2013-05-18 13:00:00'], dtype='datetime64[ns]', freq='H')

# offset in seconds according to timezone:
(t_loc.values-t.values)//1e9
# array([-18000, -18000], dtype='timedelta64[ns]')

Note that this will leave you with strange things during DST transitions, e.g.

t = pd.date_range(start="2020-03-08 01:00:00", periods=2, freq='H', tz="US/Central")
(t.values[1]-t.values[0])//1e9
# numpy.timedelta64(3600,'ns')

t_loc = t.tz_localize(None)
(t_loc.values[1]-t_loc.values[0])//1e9
# numpy.timedelta64(7200,'ns')

In contrast, tz_convert(None) does not modify the internal timestamp, it just removes the tzinfo.

t_utc = t.tz_convert(None)
(t_utc.values-t.values)//1e9
# array([0, 0], dtype='timedelta64[ns]')

My bottom line would be: stick with timezone-aware datetime if you can or only use t.tz_convert(None) which doesn't modify the underlying POSIX timestamp. Just keep in mind that you're practically working with UTC then.

(Python 3.8.2 x64 on Windows 10, pandas v1.0.5.)

Related questions
                            
                                In Python, how do I use urllib to see if a website is 404 or 200?
                            
                                How to schedule a function to run every hour on Flask?
                            
                                How do I convert strings in a Pandas data frame to a 'date' data type?
                            
                                The order of keys in dictionaries
                            
                                Writing a dict to txt file and reading it back?
                            
                                Show distinct column values in pyspark dataframe
                            
                                Standard deviation of a list
                            
                                Python Matplotlib Y-Axis ticks on Right Side of Plot
                            
                                How to run Spyder in virtual environment?
                            
                                What does "three dots" in Python mean when indexing what looks like a number?
                            
                                Is "x < y < z" faster than "x < y and y < z"?
                            
                                Creating hidden arguments with Python argparse
                            
                                Threading in a PyQt application: Use Qt threads or Python threads?
                            
                                What is the difference between setUp() and setUpClass() in Python unittest?
                            
                                What is the most pythonic way to check if an object is a number?
                            
                                Revert the `--no-site-packages` option with virtualenv
                            
                                Reading in environment variables from an environment file
                            
                                How to programmatically generate markdown output in Jupyter notebooks?
                            
                                Creating functions in a loop
                            
                                Matplotlib connect scatterplot points with line - Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Convert pandas timezone-aware DateTimeIndex to naive timestamp, but in certain timezone

Tags:

python

timezone

datetime

pandas

People also ask

Recent Activity

Donate For Us