Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove non-alpha-numeric characters from strings within a dataframe column in Python?

I have a DF column which has many strings in it. I need to remove all non-alpha numeric characters from that column: i.e:

df['strings'] = ["a#bc1!","a(b$c"]

Run code:

Print(df['strings']): ['abc','abc']

I've tried:

df['strings'].replace([',','.','/','"',':',';','!','@','#','$','%',"'","*","(",")","&",],"")

But this didn't work and I feel that there should be a more efficient way to do this using regex. Any help would be very appreciated.

like image 267
TheSaint321 Avatar asked Sep 15 '17 13:09

TheSaint321


People also ask

How do you remove non-alphanumeric characters from a string in Python?

A simple solution is to use regular expressions for removing non-alphanumeric characters from a string. The idea is to use the special character \W , which matches any character which is not a word character.

How do you remove non alphabetic words from a string in Python?

The 're' module in Python provides regular expression operations, to process text. One uses these operations to manipulate text in strings. The compile() method in conjunction with the sub() method can remove all non-alphabet characters from a given string.


2 Answers

Use str.replace.

df
  strings
0  a#bc1!
1   a(b$c

df.strings.str.replace('[^a-zA-Z]', '')
0    abc
1    abc
Name: strings, dtype: object

To retain alphanumeric characters (not just alphabets as your expected output suggests), you'll need:

df.strings.str.replace('\W', '')
0    abc1
1     abc
Name: strings, dtype: object 
like image 118
cs95 Avatar answered Oct 08 '22 04:10

cs95


Since you wrote alphanumeric, you need to add 0-9 in the regex. But maybe you only wanted alphabetic...

import pandas as pd

ded = pd.DataFrame({'strings': ['a#bc1!', 'a(b$c']})

ded.strings.str.replace('[^a-zA-Z0-9]', '')

But it's basically what COLDSPEED wrote

like image 29
StefanK Avatar answered Oct 08 '22 03:10

StefanK