Why is a=a*100 almost two times faster than a*=100? [duplicate]

Question

Following the question about Chaining *= += operators and the good comment of Tom Wojcik ("Why would you assume aaa *= 200 is faster than aaa = aaa * 200 ?"), I tested it in Jupyter notebook:

%%timeit aaa = np.arange(1,101,1)
    aaa*=100

%%timeit aaa = np.arange(1,101,1)
    aaa=aaa*100

And I was surprised because the first test is longer than the second one: 1530ns and 952ns, respectively. Why these values are so different?

Kir Chou · Accepted Answer

TL;DR: this question is equivalent to the performance difference between inplace_binop (INPLACE_*) (aaa*=100) vs binop (BINARY_*) (aaa=aaa*100). The difference can be found by using dis module:

import numpy as np
import dis

aaa = np.arange(1,101,1)

dis.dis('''
for i in range(1000000):
  aaa*=100
''')

  3          14 LOAD_NAME                2 (aaa)
             16 LOAD_CONST               1 (100)
             18 INPLACE_MULTIPLY
             20 STORE_NAME               2 (aaa)
             22 JUMP_ABSOLUTE           10
        >>   24 POP_BLOCK
        >>   26 LOAD_CONST               2 (None)
             28 RETURN_VALUE

dis.dis('''
for i in range(1000000):
  aaa=aaa*100
''')

  3          14 LOAD_NAME                2 (aaa)
             16 LOAD_CONST               1 (100)
             18 BINARY_MULTIPLY
             20 STORE_NAME               2 (aaa)
             22 JUMP_ABSOLUTE           10
        >>   24 POP_BLOCK
        >>   26 LOAD_CONST               2 (None)
             28 RETURN_VALUE

Then back to your question, which is absolutely faster?

Unluckily, it's hard to say which function is faster, here's why:

You can check compile.c of CPython code directly. If you trace a bit into CPython code, here's the function call difference:

inplace_binop -> compiler_augassign -> compiler_visit_stmt
binop -> compiler_visit_expr1 -> compiler_visit_expr -> compiler_visit_kwonlydefaults

Since the function call and logic are different, that means there are tons of factors (including your input size(*), CPU...etc) could matter to the performance as well, you'll need to work on profiling to optimize your code based on your use case.

*: from others comment, you can check this post to know the performance of different input size.

Why is a=a100 almost two times faster than a=100? [duplicate]

Tags:

python

numpy

Stef1611

1 Answers

Kir Chou

Recent Activity

Donate For Us