Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python HTML Encoding \xc2\xa0

I've been struggling with this one for a while. I'm trying to write strings to HTML but have issues with the format once I've cleaned them. Here's an example:

paragraphs = ['Grocery giant and household name Woolworths is battered and bruised. ', 
'But behind the problems are still the makings of a formidable company']

x = str(" ")
for item in paragraphs:
    x = x + str(item)
x

Output:

"Grocery giant and household name\xc2\xa0Woolworths is battered and\xc2\xa0bruised. 
But behind the problems are still the makings of a formidable\xc2\xa0company"

Desired output:

"Grocery giant and household name Woolworths is battered and bruised. 
But behind the problems are still the makings of a formidable company"

I'm hoping you're able to explain why this happens and how I can fix. Thanks in advance!

like image 332
Sam Perry Avatar asked Sep 06 '15 02:09

Sam Perry


People also ask

What is \\ xc2 \\ xa0?

\xc2\xa0 means 0xC2 0xA0 is so-called. Non-breaking space. It is a kind of invisible control character in UTF-8 encodings.

How do I strip xa0 in Python?

replace() method to remove \xa0 from a string, e.g. result = my_str. replace('\xa0', ' ') . The str. replace() method will replace all occurrences of the \xa0 (non-breaking space) character with a space.


1 Answers

\xc2\xa0 means 0xC2 0xA0 is so-called

Non-breaking space

It is a kind of invisible control character in UTF-8 encodings. More info about it check the wikipedia: https://en.wikipedia.org/wiki/Non-breaking_space

I copied what you have pasted in the questions and got the expected output.

like image 91
liuyix Avatar answered Oct 04 '22 06:10

liuyix