pandas string contains lookup: NaN leads to Value Error

Question

If you would like to filter those rows for which a string is in a column value, it is possible to use something like data.sample_id.str.contains('hph') (answered before: check if string in pandas dataframe column is in list, or Check if string is in a pandas dataframe).

However, my lookup column contains emtpy cells. Terefore, str.contains() yields NaN values and I get an value error upon indexing.

`ValueError: cannot index with vector containing NA / NaN values``

What works:

# get all runs 
mask = [index for index, item in enumerate(data.sample_id.values) if 'zent' in str(item)]

Is there a more elegant and faster method (similar to str.contains()) than this one ?

jezrael · Accepted Answer

You can set parameter na in str.contains to False:

print (df.a.str.contains('hph', na=False))

Using EdChum sample:

df = pd.DataFrame({'a':['hph', np.NaN, 'sadhphsad', 'hello']})
print (df)
           a
0        hph
1        NaN
2  sadhphsad
3      hello

print (df.a.str.contains('hph', na=False))
0     True
1    False
2     True
3    False
Name: a, dtype: bool

EdChum · Answer

IIUC you can filter those rows out also

data['sample'].dropna().str.contains('hph')

Example:

In [38]:
df =pd.DataFrame({'a':['hph', np.NaN, 'sadhphsad', 'hello']})
df

Out[38]:
           a
0        hph
1        NaN
2  sadhphsad
3      hello

In [39]:
df['a'].dropna().str.contains('hph')

Out[39]:
0     True
2     True
3    False
Name: a, dtype: bool

So by calling dropna first you can then safely use str.contains on the Series as there will be no NaN values

Another way to handle the null values would be to use notnull:

In [43]:
(df['a'].notnull()) & (df['a'].str.contains('hph'))

Out[43]:
0     True
1    False
2     True
3    False
Name: a, dtype: bool

but I think passing na=False would be cleaner (@jezrael's answer)

pandas string contains lookup: NaN leads to Value Error

Tags:

pandas

Moritz

2 Answers

jezrael

EdChum

Recent Activity

Donate For Us

pandas string contains lookup: NaN leads to Value Error

Tags:

pandas

Moritz

2 Answers

jezrael

EdChum

Related questions

Recent Activity

Donate For Us