Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: replace nonbreaking space in Unicode

Tags:

python

unicode

In Python, I have a text that is Unicode-encoded. This text contains non-breaking spaces, which I want to convert to 'x'. Non-breaking spaces are equal to chr(160). I have the following code, which works great when I run it as Django via Eclipse using Localhost. No errors and any non-breaking spaces are converted.

my_text = u"hello"
my_new_text = my_text.replace(chr(160), "x")

However when I run it any other way (Python command line, Django via runserver instead of Eclipse) I get an error:

'ascii' codec can't decode byte 0xa0 in position 0: ordinal not in range(128)

I guess this error makes sense because it's trying to compare Unicode (my_text) to something that isn't Unicode. My questions are:

  1. If chr(160) isn't Unicode, what is it?
  2. How come this works when I run it from Eclipse? Understanding this would help me determine if I need to change other parts of my code. I have been testing my code from Eclipse.
  3. (most important) How do I solve my original problem of removing the non-breaking spaces? my_text is definitely going to be Unicode.
like image 353
user984003 Avatar asked Jul 11 '12 16:07

user984003


1 Answers

  1. In Python 2, chr(160) is a byte string of length one whose only byte has value 160, or hex a0. There's no meaning attached to it except in the context of a specific encoding.
  2. I'm not familiar with Eclipse, but it may be playing encoding tricks of its own.
  3. If you want the Unicode character NO-BREAK SPACE, i.e. code point 160, that's unichr(160).

E.g.,

>>> u"hello\u00a0world".replace(unichr(160), "X")
u'helloXworld
like image 188
Fred Foo Avatar answered Oct 14 '22 05:10

Fred Foo