Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing non-breaking spaces from strings using Python

I am having some trouble with a very basic string issue in Python (that I can't figure out). Basically, I am trying to do the following:

'# read file into a string  myString =  file.read()  '# Attempt to remove non breaking spaces  myString = myString.replace("\u00A0"," ")  '# however, when I print my string to output to console, I get:  Foo **<C2><A0>** Bar 

I thought that the "\u00A0" was the escape code for unicode non breaking spaces, but apparently I am not doing this properly. Any ideas on what I am doing wrong?

like image 204
dontsaythekidsname Avatar asked Apr 07 '10 18:04

dontsaythekidsname


People also ask

How do you remove unwanted spaces in a string in Python?

strip() Python String strip() function will remove leading and trailing whitespaces. If you want to remove only leading or trailing spaces, use lstrip() or rstrip() function instead.

How do I remove all spaces from a string?

The replaceAll() method of the String class replaces each substring of this string that matches the given regular expression with the given replacement. You can remove white spaces from a string by replacing " " with "".

How do you remove spaces and special characters from a string in Python?

Using 'str.replace() , we can replace a specific character. If we want to remove that specific character, replace that character with an empty string. The str. replace() method will replace all occurrences of the specific character mentioned.

How do I remove Unicode characters from a string in Python?

In python, to remove Unicode character from string python we need to encode the string by using str. encode() for removing the Unicode characters from the string.


2 Answers

You don't have a unicode string, but a UTF-8 list of bytes (which are what strings are in Python 2.x).

Try

myString = myString.replace("\xc2\xa0", " ") 

Better would be to switch to unicode -- see this article for ideas. Thus you could say

uniString = unicode(myString, "UTF-8") uniString = uniString.replace(u"\u00A0", " ") 

and it should also work (caveat: I don't have Python 2.x available right now), although you will need to translate it back to bytes (binary) when sending it to a file or printing it to a screen.

like image 61
Kathy Van Stone Avatar answered Sep 19 '22 02:09

Kathy Van Stone


No, u"\u00A0" is the escape code for non-breaking spaces. "\u00A0" is 6 characters that are not any sort of escape code. Read this.

like image 29
Ignacio Vazquez-Abrams Avatar answered Sep 20 '22 02:09

Ignacio Vazquez-Abrams