Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filtering out rows with non-alphanumeric characters

I am trying to get a DataFrame from an existing DataFrame containing only the rows where values in a certain column(whose values are strings) do not contain a certain character.

i.e. If the character we don't want is a '('

Original dataframe:

   some_col my_column
0         1      some
1         2      word
2         3    hello(

New dataframe:

   some_col my_column
0         1      some
1         2      word

I have tried df.loc['(' not in df['my_column']], but this does not work since df['my_column'] is a Series object.

I have also tried: df.loc[not df.my_column.str.contains('(')], which also does not work.

like image 510
nmog Avatar asked May 30 '18 02:05

nmog


1 Answers

You're looking for str.isalpha:

df[df.my_column.str.isalpha()]

   some_col my_column
0         1      some
1         2      word

A similar method is str.isalnum, if you want to retain letters and digits.

If you want to handle letters and whitespace characters, use

df[~df.my_column.str.contains(r'[^\w\s]')]

   some_col my_column
0         1      some
1         2      word

Lastly, if you are looking to remove punctuation as a whole, I've written a Q&A here which might be a useful read: Fast punctuation removal with pandas

like image 171
cs95 Avatar answered Oct 17 '22 13:10

cs95