Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: unicode in system commands

Suppose I have a mysterious unicode string in Python (2.7) that I want to feed to a command line program such as imagemagick (or really just get it out of Python in any way). The strings might be:

  • Adolfo López Mateos
  • Stanisława Walasiewicz
  • Jörgen Jönsson

So in Python I might make a little command like this:

cmd = u'convert -pointsize 24 label:"%s" "%s.png"' % (name, name)

If I just print cmd and get convert -pointsize 24 label:"Jörgen Jönsson" "Jörgen Jönsson.png" and then run it myself, everything is fine.

  • Adolfo López Mateos.png
  • example 1 http://4u.jeffcrouse.info/stackoverflow/A-01.png
  • Stanisława Walasiewicz.png
  • example 2 http://4u.jeffcrouse.info/stackoverflow/A-02.png

But if I do os.system( cmd ), I get this:

  • Adolfo L√≥pez Mateos.png
  • example 4 http://4u.jeffcrouse.info/stackoverflow/B-01.png
  • Stanis≈Çawa Walasiewicz.png
  • example 5 http://4u.jeffcrouse.info/stackoverflow/B-02.png

I know it's not an imagemagick problem because the filenames are messed up too. I know that Python is converting the command to ascii when it passes it off to os.system, but why is it getting the encoding so wrong? Why is it interpreting each non-ASCII character as 2 characters? According to a few articles that I've read, it might be because it's encoded as latin-1 but it's being read as utf-8, but I've tried encoding it back and forth between them and it's not helping.

I get Unicode exceptions when I try to just encode it manually as ascii without a replacement argument, but if I do name.encode('ascii','xmlcharrefreplace'), I get the following:

  • example 4 http://4u.jeffcrouse.info/stackoverflow/C-01.png
  • example 5 http://4u.jeffcrouse.info/stackoverflow/C-02.png

I'm hoping that someone recognizes this particular kind of encoding problem and can offer some advice, because I'm about out of ideas.

Thanks!

like image 396
jefftimesten Avatar asked Jan 11 '13 23:01

jefftimesten


People also ask

How is Unicode implemented in Python?

Usually this is implemented by converting the Unicode string into some encoding that varies depending on the system. Today Python is converging on using UTF-8: Python on MacOS has used UTF-8 for several versions, and Python 3.6 switched to using UTF-8 on Windows as well.

How to include Unicode characters in a Python string literal?

The default encoding for Python source code is UTF-8, so you can simply include a Unicode character in a string literal: try: with open('/tmp/input.txt', 'r') as f:... except OSError: # 'File not found' error message. print("Fichier non trouvé") Side note: Python 3 also supports using Unicode characters in identifiers:

What is the difference between normal strings and Unicode strings in Python?

Normal strings in Python are stored internally as 8-bit ASCII, while Unicode strings are stored as 16-bit Unicode. This allows for a more varied set of characters, including special characters from most languages in the world. I'll restrict my treatment of Unicode strings to the following −

What is an example of a Unicode string?

Some encodings have multiple names; for example, 'latin-1', 'iso_8859_1' and '8859’ are all synonyms for the same encoding. One-character Unicode strings can also be created with the chr() built-in function, which takes integers and returns a Unicode string of length 1 that contains the corresponding code point.


1 Answers

Use subprocess.call instead:

>>> s = u'Jörgen Jönsson'
>>> import subprocess
>>> subprocess.call(['echo', s])
Jörgen Jönsson
0
like image 129
jterrace Avatar answered Oct 24 '22 08:10

jterrace