Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is print so slow in Python 3.3 and how can I fix it?

I just tried to run this script with Python 3.3. Unfortunately it's about twice as slow than with Python 2.7.

#!/usr/bin/env python

from sys import stdin

def main():
    for line in stdin:
        try:
            fields = line.split('"', 6)
            print(fields[5])
        except:
            pass

if __name__ == '__main__':
    main()

Here are the results:

$ time zcat access.log.gz | python3 -m cProfile ./ua.py > /dev/null

real    0m13.276s
user    0m18.977s
sys     0m0.484s

$ time zcat access.log.gz | python2 -m cProfile ./ua.py > /dev/null

real    0m6.139s
user    0m11.693s
sys     0m0.408s

Profiling shows that the additional time is spend in print:

$ zcat access.log.gz | python3 -m cProfile ./ua.py | tail -15
   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:1594(_handle_fromlist)
   196806    0.234    0.000    0.545    0.000 codecs.py:298(decode)
        1    0.000    0.000   13.598   13.598 ua.py:3(<module>)
        1    4.838    4.838   13.598   13.598 ua.py:6(main)
        1    0.000    0.000   13.598   13.598 {built-in method exec}
        1    0.000    0.000    0.000    0.000 {built-in method hasattr}
  4300456    4.726    0.000    4.726    0.000 {built-in method print}
   196806    0.312    0.000    0.312    0.000 {built-in method utf_8_decode}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
  4300456    3.489    0.000    3.489    0.000 {method 'split' of 'str' objects}

$ zcat access.log.gz | python2 -m cProfile ./ua.py | tail -10
   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    6.573    6.573 ua.py:3(<module>)
        1    3.894    3.894    6.573    6.573 ua.py:6(main)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
  4300456    2.680    0.000    2.680    0.000 {method 'split' of 'str' objects}

How can I avoid this overhead? Has it something to do with UTF-8?

like image 539
Timo Avatar asked Dec 14 '13 13:12

Timo


People also ask

Is Python print function slow?

It will be slower as you are having to perform a large number of prints, any extra processing is going to incur some performance penalty. Send item to a socket queue : the program will finish the writes first, and the console from the socket will print the output with a lag.

What slows down a Python program?

If your Python code runs too fast, a call to time. sleep() is a simple way to slow down your code.

What is the effect the print () function causes?

The function print() takes zero or more arguments and displays them on the screen. Moreover, it takes any data. The print() is sending the data to an output device, i.e., console. Sending the data to the console is the effect caused by print() function.


1 Answers

Python 3 decodes data read from stdin and encodes again to stdout; it is not so much the print() function that is slower here as the unicode-to-bytes conversion and vice-versa.

In your case you probably want to bypass this and deal with bytes only; you can access the underlying BufferedIOBase implementation through the .buffer attribute:

from sys import stdin, stdout

try:
    bytes_stdin, bytes_stdout = stdin.buffer, stdout.buffer
except AttributeError:
    bytes_stdin, bytes_stdout = stdin, stdout

def main():
    for line in bytes_stdin:
        try:
            fields = line.split(b'"', 6)
            bytes_stdout.write(fields[5] + b'\n')
        except IndexError:
            pass

if __name__ == '__main__':
    main()

You'll now have to use stdout.write() as print() insists on writing to the stdout TextIOBase implementation.

Note that the .split() now uses a bytes literal b'"' and we write a bytes-literal b'\n' as well (which normally would be taken care of by print()).

The above is compatible with Python 2.6 and up. Python 2.5 doesn't support the b prefix.

like image 91
Martijn Pieters Avatar answered Sep 21 '22 05:09

Martijn Pieters