Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - Replace non-ascii character in string (»)

I need to replace in a string the character "»" with a whitespace, but I still get an error. This is the code I use:

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup

# other code

soup = BeautifulSoup(data, 'lxml')
mystring = soup.find('a').text.replace(' »','')

UnicodeEncodeError: 'ascii' codec can't encode character u'\xbb' in position 13: ordinal not in range(128)

But If I test it with this other script:

# -*- coding: utf-8 -*-
a = "hi »"
b = a.replace('»','') 

It works. Why this?

like image 287
Hyperion Avatar asked Nov 29 '16 17:11

Hyperion


People also ask

How do I ignore ASCII in Python?

In python, to remove non-ASCII characters in python, we need to use string. encode() with encoding as ASCII and error as ignore, to returns a string without ASCII character use string. decode().

How do you encode non-ASCII characters in Python?

In order to use non-ASCII characters, Python requires explicit encoding and decoding of strings into Unicode. In IBM® SPSS® Modeler, Python scripts are assumed to be encoded in UTF-8, which is a standard Unicode encoding that supports non-ASCII characters.

How do you remove non-ASCII characters?

Use . replace() method to replace the Non-ASCII characters with the empty string.


1 Answers

In order to replace the content of string using str.replace() method; you need to firstly decode the string, then replace the text and encode it back to the original text:

>>> a = "hi »"
>>> a.decode('utf-8').replace("»".decode('utf-8'), "").encode('utf-8')
'hi '

You may also use the following regex to remove all the non-ascii characters from the string:

>>> import re
>>> re.sub(r'[^\x00-\x7f]',r'', 'hi »')
'hi '
like image 134
Moinuddin Quadri Avatar answered Sep 23 '22 14:09

Moinuddin Quadri