I have a string that I got from reading a HTML webpage with bullets that have a symbol like "•" because of the bulleted list. Note that the text is an HTML source from a webpage using Python 2.7's <code>urllib2.read(webaddress)</code>. I know the unicode character for the bullet character as <code>U+2022</code>, but how do I actually replace that unicode character with something else? I tried doing <code>str.replace("•", "something")</code> but it does not appear to work... how do I do this?

<ol> <li> Decode the string to Unicode. Assuming it's UTF-8-encoded: <pre class="prettyprint"><code>str.decode("utf-8") </code></pre> </li> <li> Call the <code>replace</code> method and be sure to pass it a Unicode string as its first argument: <pre class="prettyprint"><code>str.decode("utf-8").replace(u"\u2022", "*") </code></pre> </li> <li> Encode back to UTF-8, if needed: <pre class="prettyprint"><code>str.decode("utf-8").replace(u"\u2022", "*").encode("utf-8") </code></pre> </li> </ol> (Fortunately, Python 3 puts a stop to this mess. Step 3 should really only be performed just prior to I/O. Also, mind you that calling a string <code>str</code> shadows the built-in type <code>str</code>.)

How to replace unicode characters in string with something else python?

Tags:

python

unicode

I have a string that I got from reading a HTML webpage with bullets that have a symbol like "•" because of the bulleted list. Note that the text is an HTML source from a webpage using Python 2.7's urllib2.read(webaddress).

I know the unicode character for the bullet character as U+2022, but how do I actually replace that unicode character with something else?

I tried doing str.replace("•", "something")

but it does not appear to work... how do I do this?

904

asked Oct 26 '12 20:10

Rolando

1 Answers

Decode the string to Unicode. Assuming it's UTF-8-encoded:
```
str.decode("utf-8") 
```
Call the replace method and be sure to pass it a Unicode string as its first argument:
```
str.decode("utf-8").replace(u"\u2022", "*") 
```

Encode back to UTF-8, if needed:

str.decode("utf-8").replace(u"\u2022", "*").encode("utf-8")

(Fortunately, Python 3 puts a stop to this mess. Step 3 should really only be performed just prior to I/O. Also, mind you that calling a string str shadows the built-in type str.)

146

answered Sep 21 '22 19:09

Fred Foo

Related questions
                            
                                How to use Selenium with Python?
                            
                                where to put freeze_support() in a Python script?
                            
                                How to upgrade Python version to 3.7? [closed]
                            
                                SQLAlchemy Obtain Primary Key With Autoincrement Before Commit
                            
                                XGBoost Categorical Variables: Dummification vs encoding
                            
                                Tuple unpacking order changes values assigned
                            
                                How do I create a list of lambdas (in a list comprehension/for loop)?
                            
                                In Django is there a way to display choices as checkboxes?
                            
                                running a command as a super user from a python script
                            
                                Are classes in Python in different files?
                            
                                List only files in a directory?
                            
                                Can't set attribute for subclasses of namedtuple
                            
                                Filtering DataFrame using the length of a column
                            
                                TypeError: '<=' not supported between instances of 'str' and 'int' [duplicate]
                            
                                How to set class names dynamically?
                            
                                Get unique values from index column in MultiIndex
                            
                                How to automatically run tests when there's any change in my project (Django)?
                            
                                IPython workflow (edit, run)
                            
                                Is it worth using sqlalchemy-migrate ? [closed]
                            
                                Django: How to access original (unmodified) instance in post_save signal

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With