Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python in operator not working as expected when comparing string and strftime values [duplicate]

I'm working with datetime values converted to strings (years) in a dataframe. I would like to check whether a given year exists in my dataframe.year_as_string column using the in operator. However, my expression unexpectedly evaluates to False (see the second print statement). Why does this happen?

NB: I can probably solve my problem in a simpler way (as in the 3rd print statement), but I am really curious as to why the second statement evaluates to False.

import pandas as pd

ind = pd.to_datetime(['2013-12-31', '2014-12-31'])

df = pd.DataFrame([1, 2], index=ind)
df = df.reset_index()
df.columns = ['year', 'value']
df['year_as_string'] = df.year.dt.strftime('%Y')

# 1. the string '2013' is equal to the first element of the list
print('2013' == df['year_as_string'][0])

# 2. but that same string is not 'in' the list?! Why does this evaluate to False?
print('2013' in df['year_as_string'])

# 3. I further saw that strftiming the DatetimeIndex itself does evaluate as I would expect
year = ind.strftime('%Y')
print('2013' in year)
like image 917
Niels Hameleers Avatar asked Dec 24 '22 04:12

Niels Hameleers


2 Answers

The in operator with a Pandas series will check the index, much like using in with a dictionary will check keys only. Instead, you can use in with a series' NumPy array representation:

'2013' in df['year_as_string'].values

A more Pandorable approach would be to construct a Boolean series and then use pd.Series.any:

(df['year_as_string'] == '2013').any()

Equivalently:

df['year_as_string'].eq('2013').any()

Even better, avoid converting to strings unless absolutely necessary:

df['year_as_int'] = df['year'].dt.year
df['year_as_int'].eq(2013).any()
like image 62
jpp Avatar answered Dec 25 '22 18:12

jpp


In your second statement it checks the index numbers and not the values of the column. If you want to check the values you can use:

print('2013' in df.to_string(index = False, columns=['year_as_string']))))
like image 24
Jones1220 Avatar answered Dec 25 '22 18:12

Jones1220