Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas string contains lookup: NaN leads to Value Error

Tags:

pandas

If you would like to filter those rows for which a string is in a column value, it is possible to use something like data.sample_id.str.contains('hph') (answered before: check if string in pandas dataframe column is in list, or Check if string is in a pandas dataframe).

However, my lookup column contains emtpy cells. Terefore, str.contains() yields NaN values and I get an value error upon indexing.

`ValueError: cannot index with vector containing NA / NaN values``

What works:

# get all runs 
mask = [index for index, item in enumerate(data.sample_id.values) if 'zent' in str(item)]

Is there a more elegant and faster method (similar to str.contains()) than this one ?

like image 910
Moritz Avatar asked Aug 08 '16 10:08

Moritz


2 Answers

You can set parameter na in str.contains to False:

print (df.a.str.contains('hph', na=False))

Using EdChum sample:

df = pd.DataFrame({'a':['hph', np.NaN, 'sadhphsad', 'hello']})
print (df)
           a
0        hph
1        NaN
2  sadhphsad
3      hello

print (df.a.str.contains('hph', na=False))
0     True
1    False
2     True
3    False
Name: a, dtype: bool
like image 85
jezrael Avatar answered Oct 25 '22 04:10

jezrael


IIUC you can filter those rows out also

data['sample'].dropna().str.contains('hph')

Example:

In [38]:
df =pd.DataFrame({'a':['hph', np.NaN, 'sadhphsad', 'hello']})
df

Out[38]:
           a
0        hph
1        NaN
2  sadhphsad
3      hello

In [39]:
df['a'].dropna().str.contains('hph')

Out[39]:
0     True
2     True
3    False
Name: a, dtype: bool

So by calling dropna first you can then safely use str.contains on the Series as there will be no NaN values

Another way to handle the null values would be to use notnull:

In [43]:
(df['a'].notnull()) & (df['a'].str.contains('hph'))

Out[43]:
0     True
1    False
2     True
3    False
Name: a, dtype: bool

but I think passing na=False would be cleaner (@jezrael's answer)

like image 27
EdChum Avatar answered Oct 25 '22 05:10

EdChum