Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does isinstance return the wrong value only inside a series map?

Tags:

python

pandas

A call to isinstance returns True outside but False inside a map over a series (and an applymap over a dataframe)...

import pandas as pd
import pytz
s = pd.Series([pd.Timestamp(2018,5,11,6,0,0,0, pytz.timezone('UTC'))])
s

0   2018-05-11 06:00:00+00:00
dtype: datetime64[ns, UTC]

A call to isinstance for the single value in this series yields True.

isinstance(s.iloc[0], pd.Timestamp)
True

Inside a map over the series it gives True.

s.map(lambda x: isinstance(x, pd.Timestamp)).iloc[0]
True

But if we try something contingent on that value, say convert to a string...

s.map(lambda x: x.isoformat() if isinstance(x, pd.Timestamp) else x).iloc[0]
Timestamp('2018-05-11 06:00:00+0000', tz='UTC')

...it appears to have returned False and the method isoformat is not called (The actual method call is irrelevant because its not called).

like image 595
Ymareth Avatar asked May 14 '18 15:05

Ymareth


2 Answers

Looking at the source of .map, it appears that Pandas is checking if type of the Series is an extension type. As OP points out, this will behave differently for different time zones. Let

s1 = pd.Series([
    pd.Timestamp(2018,5,11,6,0,0,0),
])

s2 = pd.Series([
    pd.Timestamp(2018,5,11,6,0,0,0, pytz.timezone('UTC')),
])

When .map is called, it checks pd.api.types.is_extension_type(s). If s == s1, this returns False, while if s == s2 this returns True.

As a result, s2.map turns into s2._values.map. Since s2._values is of type DatetimeIndex, the relevant implementation of .map is called. It first tries to call f(s2._values) and reverts to s2._values.map(f) if an error occurs.

In this case, f = lambda x: x.isoformat(x) if isinstance(x, pd.Timestamp) else x. No error occurs because f checks if isinstance(s2._values, pd.Timestamp), which fails. Hence, f(s2._values) returns s2._values. Indeed, this can be verified with s2._values is f(s2._values) == True.

One workaround is to make sure that the pd.api.types.is_extension_type is not called, e.g. s.astype(object).map.

like image 180
hilberts_drinking_problem Avatar answered Sep 21 '22 18:09

hilberts_drinking_problem


It looks like datetime series are converted to DatetimeIndex and then the index passed to the function. Of course the index fails the isinstance check.

def f(x):
    print(x)
    if isinstance(x, pd.Timestamp):
        print('{} == {}'.format(type(x).__name__, pd.Timestamp.__name__))
        return x.isoformat()
    else:
        print('{} != {}'.format(type(x).__name__, pd.Timestamp.__name__))
        return x

print(s.map(f))

Output:

DatetimeIndex(['2018-05-11 06:00:00+00:00'], dtype='datetime64[ns, UTC]', freq=None)
DatetimeIndex != Timestamp
0   2018-05-11 06:00:00+00:00
dtype: datetime64[ns, UTC]

This does not happens with all series but it seems to depend on the type. Myabe it happens with extension types or only with datetime.

like image 35
Stop harming Monica Avatar answered Sep 19 '22 18:09

Stop harming Monica