I'm calculating the euclidean distance between two vectors represented by tuples.
(u[0]-v[0])**2 + (u[1]-v[1])**2 + (u[3]-v[3])**2 ...
The hard-coded way of doing this is pretty fast. However, I would like to make no assumptions about the length of these vectors. That results in solutions like:
sum([(a-b)**2 for a, b in izip(u, v)]) # Faster without generator
or
sum = 0
for i in xrange(len(u)):
sum += (u[i]-v[i])**2
which turn out to be much (at least twice) slower than the first version. Is there some smart way of optimizing this, without resorting to NumPy/SciPy? I'm aware that those packages offer the fastest way of doing such things, but at the moment, I'm more trying to get experience with optimizing "bare Python". What I found works fast is to dynamically build a string that defines the function and exec()
it, but that's really a last resort, I would say...
The requirements:
exec()
), pure PythonEven though my question is about the matter of small operations in general, you may assume in your solution that one of the vectors remains the same over several function calls.
mysum = 0
for a, b in izip(u, v) :
mysum += (a-b)**2
About 35% faster than #3
PS: have you tried Cython (not CPython) or Shedskin?
What I'm understanding is that you don't really need to make the code faster, you just want to know why it's slower. To answer that, let's look at the disassembly. For the purposes of this discussion, I'm going to wrap each method in a function call, the loading of u
and v
and the return command can be ignored in each disassembly.
def test1(u, v):
return (u[0]-v[0])**2 + (u[1]-v[1])**2 + (u[3]-v[3])**2
dis.dis(test1)
0 LOAD_FAST 0 (u)
3 LOAD_CONST 1 (0)
6 BINARY_SUBSCR
7 LOAD_FAST 1 (v)
10 LOAD_CONST 1 (0)
13 BINARY_SUBSCR
14 BINARY_SUBTRACT
15 LOAD_CONST 2 (2)
18 BINARY_POWER
19 LOAD_FAST 0 (u)
22 LOAD_CONST 3 (1)
25 BINARY_SUBSCR
26 LOAD_FAST 1 (v)
29 LOAD_CONST 3 (1)
32 BINARY_SUBSCR
33 BINARY_SUBTRACT
34 LOAD_CONST 2 (2)
37 BINARY_POWER
38 BINARY_ADD
39 LOAD_FAST 0 (u)
42 LOAD_CONST 4 (3)
45 BINARY_SUBSCR
46 LOAD_FAST 1 (v)
49 LOAD_CONST 4 (3)
52 BINARY_SUBSCR
53 BINARY_SUBTRACT
54 LOAD_CONST 2 (2)
57 BINARY_POWER
58 BINARY_ADD
59 RETURN_VALUE
I cut the first example off at a length of 3 because it would just repeat the same pattern over and over. You can quickly see that there is no function call overhead and pretty much the interpreter is doing the minimum possible work on these operands to achieve your result.
def test2(u, v):
sum((a-b)**2 for a, b in izip(u, v))
dis.dis(test2)
0 LOAD_GLOBAL 0 (sum)
3 LOAD_CONST 1 (<code object <genexpr> at 02C6F458, file "<pyshell#10>", line 2>)
6 MAKE_FUNCTION 0
9 LOAD_GLOBAL 1 (izip)
12 LOAD_FAST 0 (u)
15 LOAD_FAST 1 (v)
18 CALL_FUNCTION 2
21 GET_ITER
22 CALL_FUNCTION 1
25 CALL_FUNCTION 1
28 RETURN_VALUE
What we see here is that we create a function out of the generator expression, load 2 globals (sum and izip, global lookups are slower than local lookups, we can't avoid making them once but if they're going to be called in a tight loop, many people assign them to a local, such as _izip
or _sum
), and then suffer 4 expensive bytecode operations in a row, calling izip, getting the iterator from the generator, calling the function created by the generator, and then calling the sum function (which will consume the iterator and add each item before returning).
def test3(u, v):
sum = 0
for i in xrange(len(u)):
sum += (u[i]-v[i])**2
dis.dis(test3)
0 LOAD_CONST 1 (0)
3 STORE_FAST 2 (sum)
6 SETUP_LOOP 52 (to 61)
9 LOAD_GLOBAL 0 (xrange)
12 LOAD_GLOBAL 1 (len)
15 LOAD_FAST 0 (u)
18 CALL_FUNCTION 1
21 CALL_FUNCTION 1
24 GET_ITER
25 FOR_ITER 32 (to 60)
28 STORE_FAST 3 (i)
31 LOAD_FAST 2 (sum)
34 LOAD_FAST 0 (u)
37 LOAD_FAST 3 (i)
40 BINARY_SUBSCR
41 LOAD_FAST 1 (v)
44 LOAD_FAST 3 (i)
47 BINARY_SUBSCR
48 BINARY_SUBTRACT
49 LOAD_CONST 2 (2)
52 BINARY_POWER
53 INPLACE_ADD
54 STORE_FAST 2 (sum)
57 JUMP_ABSOLUTE 25
60 POP_BLOCK
61 LOAD_CONST 0 (None)
64 RETURN_VALUE
What we see here is a more straightforward version of what is happening in test2
. No generator expression or call to sum, but we've replaced that function call overhead with an unnecessary function call by doing xrange(len(u))
instead of the faster solution suggested by @Lucas Malor.
def test4(u, v):
mysum = 0
for a, b in izip(u, v) :
mysum += (a-b)**2
return mysum
dis.dis(test4)
0 LOAD_CONST 1 (0)
3 STORE_FAST 2 (mysum)
6 SETUP_LOOP 47 (to 56)
9 LOAD_GLOBAL 0 (izip)
12 LOAD_FAST 0 (u)
15 LOAD_FAST 1 (v)
18 CALL_FUNCTION 2
21 GET_ITER
22 FOR_ITER 30 (to 55)
25 UNPACK_SEQUENCE 2
28 STORE_FAST 3 (a)
31 STORE_FAST 4 (b)
34 LOAD_FAST 2 (mysum)
37 LOAD_FAST 3 (a)
40 LOAD_FAST 4 (b)
43 BINARY_SUBTRACT
44 LOAD_CONST 2 (2)
47 BINARY_POWER
48 INPLACE_ADD
49 STORE_FAST 2 (mysum)
52 JUMP_ABSOLUTE 22
55 POP_BLOCK
56 LOAD_FAST 2 (mysum)
59 RETURN_VALUE
The above represents @Lucas Malor's contribution and it's faster in a few ways. It replaces subscript operations with unpacking while reducing the number of calls to 1. This is, in many cases, as fast you're going to achieve with the constraints you've given us.
Note that it would only be worth eval
uating a run-time generated string similar to the function in test1 if you were going to call the function enough times to merit the overhead. Note also that as the length of u and v becomes increasingly large (which is typically how algorithms of this type are evaluated) the function call overhead of the other solutions becomes increasingly small and therefore, in most cases, the most readable solution is vastly superior. At the same time, even though it's slower in small cases, if the length of your sequences, u and v, may be very long, I recommend a generator expression as opposed to a list comprehension. The memory savings will cause much faster execution in most cases (and faster gc).
Overall, my recommendation is that the tiny speedup in cases of short sequences is just not worth the increase in code size and inconsistent behavior with other implementations of python you're looking at by performing micro-optimizations. The "best" solution is almost certainly test2
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With