Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

delete rows containing numeric values in strings from pandas dataframe

Tags:

python

pandas

I have a pandas data frame with 2 columns, type and text The text column contains string values. How can I delete rows which contains some numeric values in the text column. e.g:

`ABC 1.3.2`, `ABC12`, `2.2.3`, `ABC 12 1`

I have tried below, but get an error. Any idea why this is giving error?

df.drop(df[bool(re.match('^(?=.*[0-9]$)', df['text'].str))].index)
like image 393
jacob mathew Avatar asked Jun 11 '18 18:06

jacob mathew


2 Answers

In your case, I think it's better to use simple indexing rather than drop. For example:

>>> df
       text type
0       abc    b
1    abc123    a
2       cde    a
3  abc1.2.3    b
4     1.2.3    a
5       xyz    a
6    abc123    a
7      9999    a
8     5text    a
9      text    a


>>> df[~df.text.str.contains(r'[0-9]')]
   text type
0   abc    b
2   cde    a
5   xyz    a
9  text    a

That locates any rows with no numeric text

To explain:

df.text.str.contains(r'[0-9]')

returns a boolean series of where there are any digits:

0    False
1     True
2    False
3     True
4     True
5    False
6     True
7     True
8     True
9    False

and you can use this with the ~ to index your dataframe wherever that returns false

like image 70
sacuL Avatar answered Sep 22 '22 12:09

sacuL


Data from jpp

s[s.str.isalpha()]
Out[261]: 
0    ABC
2    DEF
6    GHI
dtype: object
like image 38
BENY Avatar answered Sep 19 '22 12:09

BENY