I need to replace in a string the character "»" with a whitespace, but I still get an error. This is the code I use: <pre class="prettyprint"><code># -*- coding: utf-8 -*- from bs4 import BeautifulSoup # other code soup = BeautifulSoup(data, 'lxml') mystring = soup.find('a').text.replace(' »','') </code></pre> <blockquote> UnicodeEncodeError: 'ascii' codec can't encode character u'\xbb' in position 13: ordinal not in range(128) </blockquote> But If I test it with this other script: <pre class="prettyprint"><code># -*- coding: utf-8 -*- a = "hi »" b = a.replace('»','') </code></pre> It works. Why this?

In order to replace the content of string using <code>str.replace()</code> method; you need to firstly decode the string, then replace the text and encode it back to the original text: <pre class="prettyprint"><code>>>> a = "hi »" >>> a.decode('utf-8').replace("»".decode('utf-8'), "").encode('utf-8') 'hi ' </code></pre> You may also use the following regex to remove all the non-ascii characters from the string: <pre class="prettyprint"><code>>>> import re >>> re.sub(r'[^\x00-\x7f]',r'', 'hi »') 'hi ' </code></pre>

Python - Replace non-ascii character in string (»)

Tags:

python

string

regex

encoding

decoding

I need to replace in a string the character "»" with a whitespace, but I still get an error. This is the code I use:

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup

# other code

soup = BeautifulSoup(data, 'lxml')
mystring = soup.find('a').text.replace(' »','')

UnicodeEncodeError: 'ascii' codec can't encode character u'\xbb' in position 13: ordinal not in range(128)

But If I test it with this other script:

# -*- coding: utf-8 -*-
a = "hi »"
b = a.replace('»','')

It works. Why this?

287

asked Nov 29 '16 17:11

Hyperion

1 Answers

In order to replace the content of string using str.replace() method; you need to firstly decode the string, then replace the text and encode it back to the original text:

>>> a = "hi »"
>>> a.decode('utf-8').replace("»".decode('utf-8'), "").encode('utf-8')
'hi '

You may also use the following regex to remove all the non-ascii characters from the string:

>>> import re
>>> re.sub(r'[^\x00-\x7f]',r'', 'hi »')
'hi '

134

answered Sep 23 '22 14:09

Moinuddin Quadri

Related questions
                            
                                Adding modules from opencv_contrib to OpenCV
                            
                                Pretty-printing JSON with ASCII color in python
                            
                                How to create a traceback object
                            
                                python QLineEdit Text Color
                            
                                tkinter: Open a new window with a button prompt [closed]
                            
                                How to use Cython typed memoryviews to accept strings from Python?
                            
                                Feedparser.parse() 'SSL: CERTIFICATE_VERIFY_FAILED'
                            
                                Extract News article content from stored .html pages
                            
                                Pandas: How to filter dataframe for duplicate items that occur at least n times in a dataframe
                            
                                Pandas replacing values on specific columns
                            
                                In Pandas, whats the equivalent of 'nrows' from read_csv() to be used in read_excel()?
                            
                                How do I get Python libraries in pyspark?
                            
                                Python Loop: List Index Out of Range
                            
                                Implement packing/unpacking in an object
                            
                                Pandas: replace empty cell to 0
                            
                                AttributeError: 'module' object has no attribute 'SFrame'
                            
                                customizing django admin ChangeForm template / adding custom content
                            
                                "No driver name specified" writing pandas data frame into SQL Server table
                            
                                How to convert a numeric column in pandas to a string with comma separators?
                            
                                How to use several summary collections in Tensorflow?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With