Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace unicode characters in string with something else python?

Tags:

python

unicode

I have a string that I got from reading a HTML webpage with bullets that have a symbol like "•" because of the bulleted list. Note that the text is an HTML source from a webpage using Python 2.7's urllib2.read(webaddress).

I know the unicode character for the bullet character as U+2022, but how do I actually replace that unicode character with something else?

I tried doing str.replace("•", "something")

but it does not appear to work... how do I do this?

like image 904
Rolando Avatar asked Oct 26 '12 20:10

Rolando


People also ask

How do you replace a character in a string with something else in Python?

Python String | replace() replace() is an inbuilt function in the Python programming language that returns a copy of the string where all occurrences of a substring are replaced with another substring. Parameters : old – old substring you want to replace. new – new substring which would replace the old substring.

How do you escape a Unicode character in Python?

Unicode Literals in Python Source Code Specific code points can be written using the \u escape sequence, which is followed by four hex digits giving the code point. The \U escape sequence is similar, but expects 8 hex digits, not 4.

How do I remove non ASCII characters from a string in Python?

Remove Non-ASCII Characters From Text Python Here we can use the replace() method for removing the non-ASCII characters from the string. In Python the str. replace() is an inbuilt function and this method will help the user to replace old characters with a new or empty string.

How do you replace symbols in Python?

replace() method helps to replace the occurrence of the given old character with the new character or substring. The method contains the parameters like old(a character that you wish to replace), new(a new character you would like to replace with), and count(a number of times you want to replace the character).


1 Answers

  1. Decode the string to Unicode. Assuming it's UTF-8-encoded:

    str.decode("utf-8") 
  2. Call the replace method and be sure to pass it a Unicode string as its first argument:

    str.decode("utf-8").replace(u"\u2022", "*") 
  3. Encode back to UTF-8, if needed:

    str.decode("utf-8").replace(u"\u2022", "*").encode("utf-8") 

(Fortunately, Python 3 puts a stop to this mess. Step 3 should really only be performed just prior to I/O. Also, mind you that calling a string str shadows the built-in type str.)

like image 146
Fred Foo Avatar answered Sep 21 '22 19:09

Fred Foo