I have a DF column which has many strings in it. I need to remove all non-alpha numeric characters from that column: i.e:
df['strings'] = ["a#bc1!","a(b$c"]
Run code:
Print(df['strings']): ['abc','abc']
I've tried:
df['strings'].replace([',','.','/','"',':',';','!','@','#','$','%',"'","*","(",")","&",],"")
But this didn't work and I feel that there should be a more efficient way to do this using regex. Any help would be very appreciated.
A simple solution is to use regular expressions for removing non-alphanumeric characters from a string. The idea is to use the special character \W , which matches any character which is not a word character.
The 're' module in Python provides regular expression operations, to process text. One uses these operations to manipulate text in strings. The compile() method in conjunction with the sub() method can remove all non-alphabet characters from a given string.
Use str.replace
.
df
strings
0 a#bc1!
1 a(b$c
df.strings.str.replace('[^a-zA-Z]', '')
0 abc
1 abc
Name: strings, dtype: object
To retain alphanumeric characters (not just alphabets as your expected output suggests), you'll need:
df.strings.str.replace('\W', '')
0 abc1
1 abc
Name: strings, dtype: object
Since you wrote alphanumeric, you need to add 0-9 in the regex. But maybe you only wanted alphabetic...
import pandas as pd
ded = pd.DataFrame({'strings': ['a#bc1!', 'a(b$c']})
ded.strings.str.replace('[^a-zA-Z0-9]', '')
But it's basically what COLDSPEED wrote
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With