I am new to python and am using it to use nltk in my project.After word-tokenizing the raw data obtained from a webpage I got a list containing '\xe2' ,'\xe3','\x98' etc.However I do not need these and want to delete them. I simply tried <pre class="prettyprint"><code>if '\x' in a </code></pre> and <pre class="prettyprint"><code>if a.startswith('\xe') </code></pre> and it gives me an error saying invalid \x escape But when I try a regular expression <pre class="prettyprint"><code>re.search('^\\x',a) </code></pre> i get <pre class="prettyprint"><code>Traceback (most recent call last): File "<pyshell#83>", line 1, in <module> print re.search('^\\x',a) File "C:\Python26\lib\re.py", line 142, in search return _compile(pattern, flags).search(string) File "C:\Python26\lib\re.py", line 245, in _compile raise error, v # invalid expression error: bogus escape: '\\x' </code></pre> even re.search('^\\x',a) is not identifying it. I am confused by this,even googling didnt help(I might be missing something).Please suggest any simple way to remove such strings from the list and what was wrong with the above. Thanks in advance!

You can use <code>unicode(a, 'ascii', 'ignore')</code> to remove all non-ascii characters in the string at once.

how to remove '\xe2' from a list

Tags:

I am new to python and am using it to use nltk in my project.After word-tokenizing the raw data obtained from a webpage I got a list containing '\xe2' ,'\xe3','\x98' etc.However I do not need these and want to delete them.

I simply tried

Click to copy

if '\x' in a

and

Click to copy

if a.startswith('\xe')

and it gives me an error saying invalid \x escape

But when I try a regular expression

Click to copy

re.search('^\\x',a)

i get

Click to copy

Traceback (most recent call last):
File "<pyshell#83>", line 1, in <module>
print re.search('^\\x',a)
File "C:\Python26\lib\re.py", line 142, in search
return _compile(pattern, flags).search(string)
File "C:\Python26\lib\re.py", line 245, in _compile
raise error, v # invalid expression
error: bogus escape: '\\x'

even re.search('^\\x',a) is not identifying it.

I am confused by this,even googling didnt help(I might be missing something).Please suggest any simple way to remove such strings from the list and what was wrong with the above.

Thanks in advance!

986

asked Jul 25 '10 11:07

silentNinJa

1 Answers

You can use unicode(a, 'ascii', 'ignore') to remove all non-ascii characters in the string at once.

answered Oct 02 '22 06:10

cypheon

Related questions
                            
                                netcdf4 extract for subset of lat lon
                            
                                Get highest duration from a list of strings
                            
                                AttributeError: partially initialized module 'turtle' has no attribute 'Turtle' (most likely due to a circular import)
                            
                                Where can i get technical information on how the internals of Django works?
                            
                                How to find length of an element in a list?
                            
                                Converting a String to List in Python
                            
                                Python Chain getattr as a string
                            
                                Converting exponential to float
                            
                                Will python class __init__ method implicitly return None?
                            
                                Is there a way in beautiful soup to count the number of tags in a html page
                            
                                Most Pythonic Way to Build Dictionary From Single List
                            
                                Why won't my Flask app connect to my CSS files? [duplicate]
                            
                                Discord Bot Role Mentioning
                            
                                Framework/CMS suggestions for enterprise website & intranet (I've got to convince the president its solid!) [closed]
                            
                                Understanding Zope internals, from Django eyes
                            
                                How to load compiled python modules from memory?
                            
                                In Python, how do I make a datetime that is 15 minutes from now? 1 hour from now? [duplicate]
                            
                                Python - How To Rename A Text File With DateTime
                            
                                Python Four Digits Counter
                            
                                Python unicode in Mac os X terminal

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how to remove '\xe2' from a list

Tags:

python

regex

silentNinJa

People also ask

1 Answers

cypheon

Recent Activity

Donate For Us