Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conversion of unicode minus sign ( from matplotlib ticklabels )

I'm having a problem with the Text object that matplotlib use to represent the ticklabels.

For testing reason I need to check the value of the ticks labels that are created in a plot. If the label is a string or a positive number, there is no problem: a unicode string is returned, I test it (or convert it to a number, given the circumstances) and everything is fine.

But if the label is a negative number what I get back is a mangled unicode string for a reason I cannot understand.

Let's take this example code:

import pylab as plt
fig, ax = plt.subplots(1)
ax.plot([-1, 0, 1, 2], range(4))
labels = ax.get_xticklabels()

now, if I ask the text content of the second label (the 0) I obtain a normal unicode string:

labels[1].get_text()
# u'0.0'

but the unicode of the first one (the -1) is a strange thing

labels[1].get_text()
# u'\u22121'

This is printed correctly in the terminal, but in this case I need to confront it with a numerical value, and every conversion fail, both with int and float.

I tried to convert it to an UTF-8 string with

text = labels[1].get_text()
text.encode('utf8')
# '\xe2\x88\x921'

but again it is something that is correctly printed and raise an error when converted. I also looked to the unicodedata module, but looks like it can only convert single character, so in this case is useless. I've tried also to normalize the string with unicodedata.normalize and any possible format, but again no success.

I moved to the pipy module unidecode (as suggested in Python and character normalization), again without any success

from unidecode import unidecode
unidecode(text)
# '[?]1'

I have tried also to avoid font issues using the solution in Non-ASCII characters in Matplotlib, but with the same result (I'm not sure if it should even have something to do, being that a problem of visualization...). the question Accented characters in Matplotlib has a similar problem, as it is concerned about the visualization and not the value in itself

I'm starting to feel a little lost...I know that python 2.7 has some unicode "difficulty", but normally I can avoid them in a way or the other.

I know that the issue is the minus sign, as I can avoid the problem using a brute replacement of the culprit:

text.replace(u'\u2212', '-')
# u'-1'

But this is more and hack than a solution, and I'm almost certain that it's not stable across different systems, so I would like something closer to a solution.

I'm working with

  • python 2.7.3
  • matplotlib 1.2.0
  • pylab 1.7.0
  • IPython 0.13.1

on Kubuntu 12.10.

Thank you very much for your help!

EDIT:

Corrected the order of the plot, as I got the x and y inverted, sorry

EDIT2:

a similar info is present at this link:http://www.coniferproductions.com/2012/12/17/unicode-character-dump-in-python/

in the end it shows how in some books the minus sign used is a more estetically pleasant one but not recognized by the python interpreter as a valid character.

EDIT3:

Riddle solved. the character that matplotlib return is the "MINUS SIGN", i.e. the correct typografical sign for the minus. The one the keybord create is in fact "HYPHEN-MINUS", that is commonly used but not typografically correct. see on wikipedia for an explanation http://en.wikipedia.org/wiki/Hyphen-minus.

So, the simple replace I used is in fact the correct practical thing to do, but "ethically" is a bug in python (2.7 and 3.x alike) that do not recognize the correct symbol for the minus sign.

see the bug tracking in http://bugs.python.org/issue6632

EDIT4:

to disable this behavior there is a simple solution on matplotlib, just modify the rcparams, either in the .matplotlibrc or programmatically.

import matplotlib as mpl
mpl.rcParams['axes.unicode_minus']=False
like image 307
EnricoGiampieri Avatar asked Mar 21 '13 01:03

EnricoGiampieri


1 Answers

Use plt.xticks() instead of ax.get_xticklabels():

import matplotlib.pyplot as plt

fig, ax = plt.subplots(1)
ax.plot([-1, 0, 1, 2], range(4))
plt.savefig('/tmp/test.png')
loc, labels = plt.xticks()
print(type(loc))
# <type 'numpy.ndarray'>
print(loc)
# [-1.  -0.5  0.   0.5  1.   1.5  2. ]
like image 191
unutbu Avatar answered Oct 12 '22 08:10

unutbu