Python, remove all non-alphabet chars from string

Tags:

I am writing a python MapReduce word count program. Problem is that there are many non-alphabet chars strewn about in the data, I have found this post Stripping everything but alphanumeric chars from a string in Python which shows a nice solution using regex, but I am not sure how to implement it

def mapfn(k, v):     print v     import re, string      pattern = re.compile('[\W_]+')     v = pattern.match(v)     print v     for w in v.split():         yield w, 1

I'm afraid I am not sure how to use the library re or even regex for that matter. I am not sure how to apply the regex pattern to the incoming string (line of a book) v properly to retrieve the new line without any non-alphanumeric chars.

Suggestions?

973

asked Mar 20 '14 00:03

KDecker

2 Answers

Use re.sub

import re  regex = re.compile('[^a-zA-Z]') #First parameter is the replacement, second parameter is your input string regex.sub('', 'ab3d*E') #Out: 'abdE'

Alternatively, if you only want to remove a certain set of characters (as an apostrophe might be okay in your input...)

regex = re.compile('[,\.!?]') #etc.

106

answered Oct 18 '22 08:10

limasxgoesto0

If you prefer not to use regex, you might try

''.join([i for i in s if i.isalpha()])

answered Oct 18 '22 08:10

Tad

Related questions
                            
                                Python dataclass from a nested dict
                            
                                Capture stdout from a script?
                            
                                Sort tuples based on second parameter
                            
                                Why nested functions can access variables from outer functions, but are not allowed to modify them [duplicate]
                            
                                How to plot two columns of a pandas data frame using points
                            
                                Multiplying across in a numpy array
                            
                                How to merge a Series and DataFrame
                            
                                How do I add two sets?
                            
                                How to split a list into pairs in all possible ways
                            
                                Python hashable dicts
                            
                                SQLAlchemy equivalent to SQL "LIKE" statement
                            
                                Python, Pandas : write content of DataFrame into text File
                            
                                Sending data from HTML form to a Python script in Flask
                            
                                Generating a list of random numbers, summing to 1
                            
                                Requests (Caused by SSLError("Can't connect to HTTPS URL because the SSL module is not available.") Error in PyCharm requesting website
                            
                                Writing a dictionary to a csv file with one line for every 'key: value'
                            
                                Count vs len on a Django QuerySet
                            
                                SyntaxError of Non-ASCII character [duplicate]
                            
                                AttributeError: 'module' object has no attribute 'tests'
                            
                                How to "properly" print a list?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python, remove all non-alphabet chars from string

Tags:

python

regex

KDecker

People also ask

2 Answers

limasxgoesto0

Tad

Recent Activity

Donate For Us