Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python not getting raw binary from subprocess.check_call

How can I get subprocess.check_call to give me the raw binary output of a command, it seems to be encoding it incorrectly somewhere.

Details:

I have a command that returns text like this:

some output text “quote” ...

(Those quotes are unicode e2809d)

Here's how I'm calling the command:

f_output = SpooledTemporaryFile()
subprocess.check_call(cmd, shell=True, stdout=f_output)
f_output.seek(0)
output = f_output.read()

The problem is I get this:

>>> repr(output)
some output text ?quote? ...
>>> type(output)
<str>

(And if I call 'ord' the '?' I get 63.) I'm on Python 2.7 on Linux.

Note: Running the same code on OSX works correctly to me. The problem is when I run it on a Linux server.

like image 272
Greg Avatar asked May 16 '16 01:05

Greg


1 Answers

Wow, this was the weirdest issue ever but I've fixed it!

It turns out that the program it was calling (a java program) was returning different encoding depending on where it was called from!

Dev osx machine, returns the characters fine, Linux server from command line, returns them fine, called from a Django app, nope turns into "?"s.

To fix this I ended up adding this argument to the command:

-Dfile.encoding=utf-8

I got that idea here, and it seems to work. There's also a way to modify the Java program internally to do that.

Sorry I blamed Python! You guys had the right idea.

like image 86
Greg Avatar answered Sep 25 '22 00:09

Greg