Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Redirecting python's stdout to the file fails with UnicodeEncodeError

I have a python script that connects to the Twitter Firehose and sends data downstream for processing. Before it was working fine, but now I'm trying to get only the text body. (It's not a question about how I should extract data from Twitter or how do encode/decode ascii characters). So when I launch my script directly like this:

python -u fetch_script.py

It works just fine, and I can see messages are coming to the screen. For example:

root@domU-xx-xx-xx-xx:/usr/local/streaming# python -u fetch_script.py 
Cuz I'm checking you out >on Facebook<
RT @SearchlightNV: #BarryLies👳🎌 has crapped on all honest patriotic hard-working citizens in the USA but his abuse of WWII Vets is sick #2A…
"Why do men chase after women? Because they fear death."~Moonstruck
RT @SearchlightNV: #BarryLies👳🎌 has crapped on all honest patriotic hard-working citizens in the USA but his abuse of WWII Vets is sick #2A…
Never let anyone tell you not to chase your dreams. My sister came home crying today, because someone told her she's not good enough.
"I can't even ask anyone out on a date because if it doesn't end up in a high speed chase, I get bored."
RT @ColIegeStudent: Double-checking the attendance policy while still in bed
Well I just handed my life savings to ya.. #trustingyou #abouttomakebankkkkk
Zillow $Z and Redfin useless to Wells Fargo Home Mortgage, $WFC, and FannieMae $FNM. Sale history LTV now 48%, $360 appraisal fee 4 no PMI.
The latest Dump and Chase Podcast http://somedomain.com/viaRSA9W3i check it out and subscribe on iTunes, or your favorite android app #Isles

but if I try to output them to the file like this:

python -u fetch_script.py >fetch_output.txt

it immediately throws an error:

root@domU-xx-xx-xx-xx:/usr/local/streaming# python -u fetch_script.py >fetch_output.txt
ERROR:tornado.application:Uncaught exception, closing connection.
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/tornado/iostream.py", line 341, in wrapper
    callback(*args)
  File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 331, in wrapped
    raise_exc_info(exc)
  File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 302, in wrapped
    ret = fn(*args, **kwargs)
  File "/usr/local/streaming/twitter-stream.py", line 203, in parse_json
    self.parse_response(response)
  File "/usr/local/streaming/twitter-stream.py", line 226, in parse_response
    self._callback(response)
  File "fetch_script.py", line 57, in callback
    print msg['text']
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026' in position 139: ordinal not in range(128)
ERROR:tornado.application:Exception in callback <functools.partial object at 0x187c2b8>
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/tornado/ioloop.py", line 458, in _run_callback
    callback()
  File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 331, in wrapped
    raise_exc_info(exc)
  File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 302, in wrapped
    ret = fn(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tornado/iostream.py", line 341, in wrapper
    callback(*args)
  File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 331, in wrapped
    raise_exc_info(exc)
  File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 302, in wrapped
    ret = fn(*args, **kwargs)
  File "/usr/local/streaming/twitter-stream.py", line 203, in parse_json
    self.parse_response(response)
  File "/usr/local/streaming/twitter-stream.py", line 226, in parse_response
    self._callback(response)
  File "fetch_script.py", line 57, in callback
    print msg['text']
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026' in position 139: ordinal not in range(128)

P.S

Little more context:

An error is happening in callback function:

def callback(self, message):
        if message:
            msg = message
            msg_props = pika.BasicProperties()
            msg_props.content_type = 'application/text'
            msg_props.delivery_mode = 2
            #print self.count
            print msg['text']
            #self.count += 1
            ...

However If I remove ['text'] and would live only print msg both cases are working like a charm.

like image 311
Vor Avatar asked Oct 02 '13 19:10

Vor


1 Answers

Since nobody's jumped in yet, here's my shot. Python sets stdout's encoding when writing to a console but not when writing to a file. This script reproduces the problem:

import sys

msg = {'text':u'\2026'}
sys.stderr.write('default encoding: %s\n' % sys.stdout.encoding)
print msg['text']

when running the above shows the error:

$ python bad.py>/tmp/xxx
default encoding: None
Traceback (most recent call last):
  File "fix.py", line 5, in <module>
    print msg['text']
UnicodeEncodeError: 'ascii' codec can't encode character u'\x82' in position 0: ordinal not in range(128)

Adding the encoding to the above script:

import sys

msg = {'text':u'\2026'}
sys.stderr.write('default encoding: %s\n' % sys.stdout.encoding)
encoding = sys.stdout.encoding or 'utf-8'
print msg['text'].encode(encoding)

and the problem is solved:

$ python good.py >/tmp/xxx
default encoding: None
$ cat /tmp/xxx
6
like image 82
tdelaney Avatar answered Nov 18 '22 12:11

tdelaney