Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why mesh python code slower than decomposed one?

I've discovered surprising python behaviour while had investigating thread Why is reading lines from stdin much slower in C++ than Python?.

If I run simple python code from that thread

#!/usr/bin/env python
from __future__ import print_function
import time
import sys


count = 0
start_time = time.time()

for line in sys.stdin:
    count += 1

delta_sec = time.time() - start_time
if delta_sec >= 0:
    lines_per_sec = int(round(count/delta_sec))
    print("Read {0:n} lines in {1:.2f} seconds. LPS: {2:n}".format(count, delta_sec, lines_per_sec))

it works with speed 11.5M LPS, and when I decompose the whole script into single function

#!/usr/bin/env python
from __future__ import print_function
import time
import sys


def test(input):
    count = 0
    start_time = time.time()

    for line in input:
        count += 1

    delta_sec = time.time() - start_time
    if delta_sec >= 0:
        lines_per_sec = int(round(count/delta_sec))
        print("Read {0:n} lines in {1:.2f} seconds. LPS: {2:n}".format(count, delta_sec, lines_per_sec))


if __name__ == "__main__":
    test(sys.stdin)

code speeds up to 23M LPS.

Why this simple refactoring makes my code 2 times faster?

I've run my tests with python2.7 on Ubuntu 13.10.

like image 679
Eugene Krokhalev Avatar asked Jan 14 '14 06:01

Eugene Krokhalev


1 Answers

Watching into bytecode helped me to answer this question. Byte code for working part of the first script is:

 10          58 SETUP_LOOP              27 (to 88)
             61 LOAD_NAME                3 (sys)
             64 LOAD_ATTR                6 (stdin)
             67 GET_ITER         
        >>   68 FOR_ITER                16 (to 87)
             71 STORE_NAME               7 (line)
 11          74 LOAD_NAME                4 (count)
             77 LOAD_CONST               4 (1)
             80 INPLACE_ADD      
             81 STORE_NAME               4 (count)
             84 JUMP_ABSOLUTE           68
        >>   87 POP_BLOCK

And byte code for corresponding part of second script is:

 12          18 SETUP_LOOP              24 (to 45)
             21 LOAD_FAST                0 (input)
             24 GET_ITER
        >>   25 FOR_ITER                16 (to 44)
             28 STORE_FAST               3 (line)
 13          31 LOAD_FAST                1 (count)
             34 LOAD_CONST               2 (1)
             37 INPLACE_ADD
             38 STORE_FAST               1 (count)
             41 JUMP_ABSOLUTE           25
        >>   44 POP_BLOCK

I see that actual difference between this codes is LOAD_NAME vs LOAD_FAST and STORE_NAME vs STORE_FAST opcodes using. Documentation http://docs.python.org/2.7/library/dis.html#opcode-LOAD_FAST says that LOAD_FAST makes lookup using only indexes, while LOAD_NAME lookups variable by string name. And the first approach is two times faster.

like image 111
Eugene Krokhalev Avatar answered Oct 01 '22 22:10

Eugene Krokhalev