I've been struggling with this one for a while. I'm trying to write strings to HTML but have issues with the format once I've cleaned them. Here's an example:
paragraphs = ['Grocery giant and household name Woolworths is battered and bruised. ',
'But behind the problems are still the makings of a formidable company']
x = str(" ")
for item in paragraphs:
x = x + str(item)
x
Output:
"Grocery giant and household name\xc2\xa0Woolworths is battered and\xc2\xa0bruised.
But behind the problems are still the makings of a formidable\xc2\xa0company"
Desired output:
"Grocery giant and household name Woolworths is battered and bruised.
But behind the problems are still the makings of a formidable company"
I'm hoping you're able to explain why this happens and how I can fix. Thanks in advance!
\xc2\xa0 means 0xC2 0xA0 is so-called. Non-breaking space. It is a kind of invisible control character in UTF-8 encodings.
replace() method to remove \xa0 from a string, e.g. result = my_str. replace('\xa0', ' ') . The str. replace() method will replace all occurrences of the \xa0 (non-breaking space) character with a space.
\xc2\xa0 means 0xC2 0xA0 is so-called
Non-breaking space
It is a kind of invisible control character in UTF-8 encodings. More info about it check the wikipedia: https://en.wikipedia.org/wiki/Non-breaking_space
I copied what you have pasted in the questions and got the expected output.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With