I'm working with datetime values converted to strings (years) in a dataframe. I would like to check whether a given year exists in my dataframe.year_as_string column using the in
operator. However, my expression unexpectedly evaluates to False (see the second print statement). Why does this happen?
NB: I can probably solve my problem in a simpler way (as in the 3rd print statement), but I am really curious as to why the second statement evaluates to False.
import pandas as pd
ind = pd.to_datetime(['2013-12-31', '2014-12-31'])
df = pd.DataFrame([1, 2], index=ind)
df = df.reset_index()
df.columns = ['year', 'value']
df['year_as_string'] = df.year.dt.strftime('%Y')
# 1. the string '2013' is equal to the first element of the list
print('2013' == df['year_as_string'][0])
# 2. but that same string is not 'in' the list?! Why does this evaluate to False?
print('2013' in df['year_as_string'])
# 3. I further saw that strftiming the DatetimeIndex itself does evaluate as I would expect
year = ind.strftime('%Y')
print('2013' in year)
The in
operator with a Pandas series will check the index, much like using in
with a dictionary will check keys only. Instead, you can use in
with a series' NumPy array representation:
'2013' in df['year_as_string'].values
A more Pandorable approach would be to construct a Boolean series and then use pd.Series.any
:
(df['year_as_string'] == '2013').any()
Equivalently:
df['year_as_string'].eq('2013').any()
Even better, avoid converting to strings unless absolutely necessary:
df['year_as_int'] = df['year'].dt.year
df['year_as_int'].eq(2013).any()
In your second statement it checks the index numbers and not the values of the column. If you want to check the values you can use:
print('2013' in df.to_string(index = False, columns=['year_as_string']))))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With