A call to isinstance returns True outside but False inside a map over a series (and an applymap over a dataframe)...
import pandas as pd
import pytz
s = pd.Series([pd.Timestamp(2018,5,11,6,0,0,0, pytz.timezone('UTC'))])
s
0 2018-05-11 06:00:00+00:00
dtype: datetime64[ns, UTC]
A call to isinstance for the single value in this series yields True.
isinstance(s.iloc[0], pd.Timestamp)
True
Inside a map over the series it gives True.
s.map(lambda x: isinstance(x, pd.Timestamp)).iloc[0]
True
But if we try something contingent on that value, say convert to a string...
s.map(lambda x: x.isoformat() if isinstance(x, pd.Timestamp) else x).iloc[0]
Timestamp('2018-05-11 06:00:00+0000', tz='UTC')
...it appears to have returned False and the method isoformat is not called (The actual method call is irrelevant because its not called).
Looking at the source of .map
, it appears that Pandas is checking if type of the Series is an extension type. As OP points out, this will behave differently for different time zones. Let
s1 = pd.Series([
pd.Timestamp(2018,5,11,6,0,0,0),
])
s2 = pd.Series([
pd.Timestamp(2018,5,11,6,0,0,0, pytz.timezone('UTC')),
])
When .map
is called, it checks pd.api.types.is_extension_type(s)
. If s == s1
, this returns False
, while if s == s2
this returns True
.
As a result, s2.map
turns into s2._values.map
. Since s2._values
is of type DatetimeIndex
, the relevant implementation of .map
is called. It first tries to call f(s2._values)
and reverts to s2._values.map(f)
if an error occurs.
In this case, f = lambda x: x.isoformat(x) if isinstance(x, pd.Timestamp) else x
. No error occurs because f
checks if isinstance(s2._values, pd.Timestamp)
, which fails. Hence, f(s2._values)
returns s2._values
. Indeed, this can be verified with s2._values is f(s2._values) == True
.
One workaround is to make sure that the pd.api.types.is_extension_type
is not called, e.g. s.astype(object).map
.
It looks like datetime series are converted to DatetimeIndex and then the index passed to the function. Of course the index fails the isinstance
check.
def f(x):
print(x)
if isinstance(x, pd.Timestamp):
print('{} == {}'.format(type(x).__name__, pd.Timestamp.__name__))
return x.isoformat()
else:
print('{} != {}'.format(type(x).__name__, pd.Timestamp.__name__))
return x
print(s.map(f))
Output:
DatetimeIndex(['2018-05-11 06:00:00+00:00'], dtype='datetime64[ns, UTC]', freq=None)
DatetimeIndex != Timestamp
0 2018-05-11 06:00:00+00:00
dtype: datetime64[ns, UTC]
This does not happens with all series but it seems to depend on the type. Myabe it happens with extension types or only with datetime.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With