Why is print so slow in Python 3.3 and how can I fix it?

Tags:

I just tried to run this script with Python 3.3. Unfortunately it's about twice as slow than with Python 2.7.

#!/usr/bin/env python

from sys import stdin

def main():
    for line in stdin:
        try:
            fields = line.split('"', 6)
            print(fields[5])
        except:
            pass

if __name__ == '__main__':
    main()

Here are the results:

$ time zcat access.log.gz | python3 -m cProfile ./ua.py > /dev/null

real    0m13.276s
user    0m18.977s
sys     0m0.484s

$ time zcat access.log.gz | python2 -m cProfile ./ua.py > /dev/null

real    0m6.139s
user    0m11.693s
sys     0m0.408s

Profiling shows that the additional time is spend in print:

$ zcat access.log.gz | python3 -m cProfile ./ua.py | tail -15
   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:1594(_handle_fromlist)
   196806    0.234    0.000    0.545    0.000 codecs.py:298(decode)
        1    0.000    0.000   13.598   13.598 ua.py:3(<module>)
        1    4.838    4.838   13.598   13.598 ua.py:6(main)
        1    0.000    0.000   13.598   13.598 {built-in method exec}
        1    0.000    0.000    0.000    0.000 {built-in method hasattr}
  4300456    4.726    0.000    4.726    0.000 {built-in method print}
   196806    0.312    0.000    0.312    0.000 {built-in method utf_8_decode}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
  4300456    3.489    0.000    3.489    0.000 {method 'split' of 'str' objects}

$ zcat access.log.gz | python2 -m cProfile ./ua.py | tail -10
   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    6.573    6.573 ua.py:3(<module>)
        1    3.894    3.894    6.573    6.573 ua.py:6(main)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
  4300456    2.680    0.000    2.680    0.000 {method 'split' of 'str' objects}

How can I avoid this overhead? Has it something to do with UTF-8?

539

asked Dec 14 '13 13:12

Timo

1 Answers

Python 3 decodes data read from stdin and encodes again to stdout; it is not so much the print() function that is slower here as the unicode-to-bytes conversion and vice-versa.

In your case you probably want to bypass this and deal with bytes only; you can access the underlying BufferedIOBase implementation through the .buffer attribute:

from sys import stdin, stdout

try:
    bytes_stdin, bytes_stdout = stdin.buffer, stdout.buffer
except AttributeError:
    bytes_stdin, bytes_stdout = stdin, stdout

def main():
    for line in bytes_stdin:
        try:
            fields = line.split(b'"', 6)
            bytes_stdout.write(fields[5] + b'\n')
        except IndexError:
            pass

if __name__ == '__main__':
    main()

You'll now have to use stdout.write() as print() insists on writing to the stdout TextIOBase implementation.

Note that the .split() now uses a bytes literal b'"' and we write a bytes-literal b'\n' as well (which normally would be taken care of by print()).

The above is compatible with Python 2.6 and up. Python 2.5 doesn't support the b prefix.

answered Sep 21 '22 05:09

Martijn Pieters

Related questions
                            
                                Vim - run ctags on current python site-packages
                            
                                Is get_result() a required call for put_async() in Google App Engine
                            
                                Measuring performance in Python
                            
                                Interoperating with Django/Celery From Java
                            
                                How to modify the metavar for a positional argument in pythons argparse?
                            
                                ZeroMQ PUB socket buffers all my out going data when it is connecting
                            
                                Django - Handling "enum models"
                            
                                Python JPEG to movie
                            
                                Parsing a PDF with no /Root object using PDFMiner
                            
                                os.getenv returns None instead correct value [closed]
                            
                                Python PIL incorrectly decoding TIFF colors (using incorrect colorspace)?
                            
                                Python/Tornado - compressing static files
                            
                                Cannot get environment variables in Django settings file
                            
                                Python multiprocessing and handling exceptions in workers
                            
                                Inertial scrolling in Mac OS X with Tkinter and Python
                            
                                concurrent writing to the same file using threads and processes
                            
                                How to make ttk.Treeview's rows editable?
                            
                                How to define PyCharm-friendly value object in Python?
                            
                                Image cleaning before OCR application
                            
                                Multiprocessing with numpy makes Python quit unexpectedly on OSX

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is print so slow in Python 3.3 and how can I fix it?

Tags:

performance

python

unicode

python-3.3

Timo

People also ask

1 Answers

Martijn Pieters

Recent Activity

Donate For Us