Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ISIN function does not work for dates

Tags:

python

pandas

d = {'Dates':[pd.Timestamp('2013-01-02'),
              pd.Timestamp('2013-01-03'),
              pd.Timestamp('2013-01-04')],
     'Num1':[1,2,3],
     'Num2':[-1,-2,-3]}


df = DataFrame(data=d)  

We have this data frame

Dates                  Num1 Num2
0   2013-01-02 00:00:00  1  -1
1   2013-01-03 00:00:00  2  -2
2   2013-01-04 00:00:00  3  -3  

Dates    datetime64[ns]
Num1              int64
Num2              int64
dtype: object

This gives me

df['Dates'].isin([pd.Timestamp('2013-01-04')])  

0    False
1    False
2    False
Name: Dates, dtype: bool  

I am expecting a True for the date "2013-01-04", what am I missing? I using the latest 0.12 version of Pandas

like image 508
DataByDavid Avatar asked Sep 28 '13 18:09

DataByDavid


3 Answers

This worked for me.

df['Dates'].isin(np.array([pd.Timestamp('2013-01-04')]).astype('datetime64[ns]')) 

I know that it is a bit verbose. But just in case you need to make it work this would help. Refer to https://github.com/pydata/pandas/issues/5021 for more details.

like image 137
livinston Avatar answered Nov 18 '22 12:11

livinston


I have the same version of pandas, and @DSM's answer was helpful. Another workaround would be to use the apply method:

>>> df.Dates.apply(lambda date: date in [pd.Timestamp('2013-01-04')])

0    False
1    False
2     True
Name: Dates, dtype: bool
like image 35
JoC Avatar answered Nov 18 '22 13:11

JoC


Yep, that looks like a bug to me. It comes down to this part of lib.ismember:

for i in range(n):
    val = util.get_value_at(arr, i)
    if val in values:
        result[i] = 1
    else: 
        result[i] = 0

val is a numpy.datetime64 object, and values is a set of Timestamp objects. Testing membership should work, but doesn't:

>>> import pandas as pd, numpy as np
>>> ts = pd.Timestamp('2013-01-04')
>>> ts
Timestamp('2013-01-04 00:00:00', tz=None)
>>> dt64 = np.datetime64(ts)
>>> dt64
numpy.datetime64('2013-01-03T19:00:00.000000-0500')
>>> dt64 == ts
True
>>> dt64 in [ts]
True
>>> dt64 in {ts}
False

I think usually that behaviour -- working in a list, not working in a set -- is due to something going wrong with __hash__:

>>> hash(dt64)
1357257600000000
>>> hash(ts)
-7276108168457487299

You can't do membership testing in a set if the hashes aren't the same. I can think of a few ways to fix this, but choosing the best one would depend upon design choices they made when implementing Timestamps that I'm not qualified to comment on.

like image 1
DSM Avatar answered Nov 18 '22 14:11

DSM