Translate special character ½

Question

I am reading a source that contains the special character ½. How do I convert this to 1/2? The character is part of a sentence and I still need to be able to use this string "normally". I am reading webpage sources, so I'm not sure that I will always know the encoding??

Edit: I have tried looking at other answers, but they don't work for me. They always seem to start with something like:

s= u'£10"

but I get an error already there: "no encoding declared". But do I know what encoding I'm getting in, or does that not matter? Do I just pick one?

Dietrich Epp · Accepted Answer

This is really two questions.

#1. To interpret ½: Use the unicodedata module. You can ask for the numeric value of the character or you can normalize using a canonical normalization form it and parse it yourself.

>>> import unicodedata
>>> unicodedata.numeric(u'½')
0.5
>>> unicodedata.normalize('NFKC', u'½')
'1⁄2'

#2. Encoding problems: If you're working with the terminal, make sure Python knows the terminal encoding. If you're writing source files, make sure Python knows the file encoding. You can't just "pick" an encoding to set for Python, you must inform Python about the encoding that your terminal / text editor already uses.

Python lets you set the encoding of files with Vim/Emacs style comments. Put a comment at the top of the file like this if you use Vim:

# coding=UTF-8

Or this, if you use Emacs:

# -*- coding: UTF-8 -*-

If you use neither Vim nor Emacs, then it doesn't matter which one. Obviously, if you don't use UTF-8 you should substitute the encoding you actually use. (UTF-8 is the only encoding I can recommend.)

grncdr · Answer

Dietrich beat me to the punch, but here is some more detail about setting the encoding for your source file:

Because you want to search for a literal unicode ½, you need to be able to write it in your source file. Unfortunately, the Python interpreter chokes on any unicode input, unless you specify the encoding of that source file with a comment in the first couple of lines, like so:

 # coding=utf8
 # ... do stuff here ...

This assumes your editor is saving the file as UTF-8. If it's using a different encoding specify that instead. See PEP-0263 for more details.

Once you've specified the encoding you should be able to write something this in your code:

text = text.replace('½', '1/2')

Encoding of the webpage

Depending on how you are downloading the page, you probably don't need to worry about this at all, most HTTP libraries handle choosing the encoding for you automatically.

Translate special character ½

Tags:

python

unicode

user984003

2 Answers

Dietrich Epp

Encoding of the webpage

grncdr

Recent Activity

Donate For Us

Translate special character ½

Tags:

python

unicode

user984003

2 Answers

Dietrich Epp

Encoding of the webpage

grncdr

Related questions

Recent Activity

Donate For Us