I have a DF column which has many strings in it. I need to remove all non-alpha numeric characters from that column: i.e: <pre class="prettyprint"><code>df['strings'] = ["a#bc1!","a(b$c"] </code></pre> Run code: <pre class="prettyprint"><code>Print(df['strings']): ['abc','abc'] </code></pre> I've tried: <pre class="prettyprint"><code>df['strings'].replace([',','.','/','"',':',';','!','@','#','$','%',"'","*","(",")","&",],"") </code></pre> But this didn't work and I feel that there should be a more efficient way to do this using regex. Any help would be very appreciated.

Use <code>str.replace</code>. <pre class="prettyprint"><code>df strings 0 a#bc1! 1 a(b$c df.strings.str.replace('[^a-zA-Z]', '') 0 abc 1 abc Name: strings, dtype: object </code></pre> <hr> To retain alphanumeric characters (not just alphabets as your expected output suggests), you'll need: <pre class="prettyprint"><code>df.strings.str.replace('\W', '') 0 abc1 1 abc Name: strings, dtype: object </code></pre>

Since you wrote alphanumeric, you need to add 0-9 in the regex. But maybe you only wanted alphabetic... <pre class="prettyprint"><code>import pandas as pd ded = pd.DataFrame({'strings': ['a#bc1!', 'a(b$c']}) ded.strings.str.replace('[^a-zA-Z0-9]', '') </code></pre> But it's basically what COLDSPEED wrote

How to remove non-alpha-numeric characters from strings within a dataframe column in Python?

I have a DF column which has many strings in it. I need to remove all non-alpha numeric characters from that column: i.e:

df['strings'] = ["a#bc1!","a(b$c"]

Run code:

Print(df['strings']): ['abc','abc']

I've tried:

df['strings'].replace([',','.','/','"',':',';','!','@','#','$','%',"'","*","(",")","&",],"")

But this didn't work and I feel that there should be a more efficient way to do this using regex. Any help would be very appreciated.

How do you remove non-alphanumeric characters from a string in Python?

A simple solution is to use regular expressions for removing non-alphanumeric characters from a string. The idea is to use the special character \W , which matches any character which is not a word character.

How do you remove non alphabetic words from a string in Python?

The 're' module in Python provides regular expression operations, to process text. One uses these operations to manipulate text in strings. The compile() method in conjunction with the sub() method can remove all non-alphabet characters from a given string.

Use str.replace.

df
  strings
0  a#bc1!
1   a(b$c

df.strings.str.replace('[^a-zA-Z]', '')
0    abc
1    abc
Name: strings, dtype: object

To retain alphanumeric characters (not just alphabets as your expected output suggests), you'll need:

df.strings.str.replace('\W', '')
0    abc1
1     abc
Name: strings, dtype: object

Since you wrote alphanumeric, you need to add 0-9 in the regex. But maybe you only wanted alphabetic...

import pandas as pd

ded = pd.DataFrame({'strings': ['a#bc1!', 'a(b$c']})

ded.strings.str.replace('[^a-zA-Z0-9]', '')

But it's basically what COLDSPEED wrote

How to remove non-alpha-numeric characters from strings within a dataframe column in Python?

Tags:

python

regex

pandas

dataframe

TheSaint321

People also ask

2 Answers

cs95

StefanK

Recent Activity

Donate For Us

How to remove non-alpha-numeric characters from strings within a dataframe column in Python?

Tags:

python

regex

pandas

dataframe

TheSaint321

People also ask

2 Answers

cs95

StefanK

Related questions

Recent Activity

Donate For Us