I am trying to get a DataFrame from an existing DataFrame containing only the rows where values in a certain column(whose values are strings) do not contain a certain character.
i.e. If the character we don't want is a '('
Original dataframe:
some_col my_column
0 1 some
1 2 word
2 3 hello(
New dataframe:
some_col my_column
0 1 some
1 2 word
I have tried df.loc['(' not in df['my_column']]
, but this does not work since df['my_column']
is a Series object.
I have also tried: df.loc[not df.my_column.str.contains('(')]
, which also does not work.
You're looking for str.isalpha
:
df[df.my_column.str.isalpha()]
some_col my_column
0 1 some
1 2 word
A similar method is str.isalnum
, if you want to retain letters and digits.
If you want to handle letters and whitespace characters, use
df[~df.my_column.str.contains(r'[^\w\s]')]
some_col my_column
0 1 some
1 2 word
Lastly, if you are looking to remove punctuation as a whole, I've written a Q&A here which might be a useful read: Fast punctuation removal with pandas
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With