Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is "join" slower in 3.x?

I was just messing around when I came across this quirk. And I wanted to make sure I am not crazy.

The following code (works in 2.x and 3.x):

from timeit import timeit
print ('gen: %s' % timeit('"-".join(str(n) for n in range(1000))', number=10000))
print ('list: %s' % timeit('"-".join([str(n) for n in range(1000)])', number=10000))

Doing 3 runs on each version, same machine.

note: I grouped the timings on the same line to save space here.

On my Python 2.7.5:

gen: 2.37875941643, 2.44095773486, 2.41718937347
list: 2.1132466183, 2.12248106441, 2.11737128131

On my Python 3.3.2:

gen: 3.8801268438439718, 3.9939604983350185, 4.166233972077624
list: 2.976764740845537, 3.0062614747229555, 3.0734980312273894

I wonder why this is.... Might it have something to do with how strings are implemented?


EDIT: I did it again without using range() since that has also changed slightly from 2.x to 3.x Instead I use the new code below:

from timeit import timeit
print ('gen: %s' % timeit('"-".join(str(n) for n in (1, 2, 3))', number=1000000))
print ('list: %s' % timeit('"-".join([str(n) for n in (1, 2, 3)])', number=1000000))

The Timing for Python 2.7.5:

gen: 2.13911803683, 2.16418448199, 2.13403650485
list: 0.797961223325,  0.767758578433, 0.803272800119

The Timing for Python 3.3.2:

gen: 2.8188347625218486, 2.882846655874985, 3.0317612259663718
list: 1.3590610502957934, 1.4878876089869366, 1.4978070529462615

EDIT2: It seems there were some more things throwing off the calculation, so I tried bringing it down to a bare-minimum.

New Code:

from timeit import timeit
print ('gen: %s' % timeit('"".join(n for n in ("1", "2", "3"))', number=1000000))
print ('list: %s' % timeit('"".join([n for n in ("1", "2", "3")])', number=1000000))

Timing Python 2.7.5:

gen: 1.47699698704, 1.46120314534, 1.48290697384
list: 0.323474182882, 0.301259632897, 0.323756694047

Timing Python 3.3.2:

gen: 1.633002954259608, 1.6049987598860562, 1.6109927662465935
list: 0.5621341113519589, 0.5789849850819431, 0.5619928557696119

The difference is clear, it is faster in 2.x and slower in 3.x And I am curious as to why...

like image 717
Inbar Rose Avatar asked Jun 05 '13 15:06

Inbar Rose


People also ask

Which is faster between join and subquery?

The advantage of a join includes that it executes faster. The retrieval time of the query using joins almost always will be faster than that of a subquery. By using joins, you can maximize the calculation burden on the database i.e., instead of multiple queries using one join query.

Which is faster join?

In case there are a large number of rows in the tables and there is an index to use, INNER JOIN is generally faster than OUTER JOIN. Generally, an OUTER JOIN is slower than an INNER JOIN as it needs to return more number of records when compared to INNER JOIN.

Are join Statements slow?

The problem is joins are relatively slow, especially over very large data sets, and if they are slow your website is slow. It takes a long time to get all those separate bits of information off disk and put them all together again.

Is join faster than in clause?

If the joining column is UNIQUE and marked as such, both these queries yield the same plan in SQL Server . If it's not, then IN is faster than JOIN on DISTINCT . See this article in my blog for performance details: IN vs.


2 Answers

I haven't worked on python3.3 yet. All these I have stated below is based on observation.

I used python disassembler for following code in python 3.3 and python 2.7.3.

s = """
''.join([n for n in ('1', '2', '3')])
"""

I found there are changes in the upcodes.

Python 2.7.3

Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>> import dis
>>> from timeit import timeit
>>> s = """
... ''.join([n for n in ('1', '2', '3')])
... """
>>> timeit(s, number=100000)
0.08443676085287867
>>>
>>>
>>> c = compile(s, '<string>', 'exec')
>>> dis.dis(c)
  2           0 LOAD_CONST               0 ('')
              3 LOAD_ATTR                0 (join)
              6 BUILD_LIST               0
              9 LOAD_CONST               5 (('1', '2', '3'))
             12 GET_ITER
        >>   13 FOR_ITER                12 (to 28)
             16 STORE_NAME               1 (n)
             19 LOAD_NAME                1 (n)
             22 LIST_APPEND              2
             25 JUMP_ABSOLUTE           13
        >>   28 CALL_FUNCTION            1
             31 POP_TOP
             32 LOAD_CONST               4 (None)
             35 RETURN_VALUE
>>>

python 3.3

Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600 32 bit (In
tel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import dis
>>> from timeit import timeit
>>> s = """
... ''.join([n for n in ('1', '2', '3')])
... """
>>> timeit(s, number=100000)
0.13603410021487614
>>>
>>>
>>> c = compile(s, '<string>', 'exec')
>>> dis.dis(c)
  2           0 LOAD_CONST               0 ('')
              3 LOAD_ATTR                0 (join)
              6 LOAD_CONST               1 (<code object <listcomp> at 0x01F70BB
0, file "<string>", line 2>)
              9 LOAD_CONST               2 ('<listcomp>')
             12 MAKE_FUNCTION            0
             15 LOAD_CONST               7 (('1', '2', '3'))
             18 GET_ITER
             19 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             22 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             25 POP_TOP
             26 LOAD_CONST               6 (None)
             29 RETURN_VALUE
>>>

From the upcodes I got that it was list comprehension that had changed so i checked for list comprehension in both version

Python 2.7.3

Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>>
>>>
>>> import dis
>>> from timeit import timeit
>>> s = """
... [i for i in ('1', '2', '3')]
... """
>>> timeit(s, number=100000)
0.059500395456104374
>>> c = compile(s, '<string>', 'exec')
>>> dis.dis(c)
  2           0 BUILD_LIST               0
              3 LOAD_CONST               4 (('1', '2', '3'))
              6 GET_ITER
        >>    7 FOR_ITER                12 (to 22)
             10 STORE_NAME               0 (i)
             13 LOAD_NAME                0 (i)
             16 LIST_APPEND              2
             19 JUMP_ABSOLUTE            7
        >>   22 POP_TOP
             23 LOAD_CONST               3 (None)
             26 RETURN_VALUE
>>>

python 3.3

Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600 32 bit (In
tel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>>
>>>
>>> import dis
>>> from timeit import timeit
>>> s = """
... [i for i in ('1', '2', '3')]
... """
>>> timeit(s, number=100000)
0.09876976988887567
>>> c = compile(s, '<string>', 'exec')
>>> dis.dis(c)
  2           0 LOAD_CONST               0 (<code object <listcomp> at 0x01FF0BB
0, file "<string>", line 2>)
              3 LOAD_CONST               1 ('<listcomp>')
              6 MAKE_FUNCTION            0
              9 LOAD_CONST               6 (('1', '2', '3'))
             12 GET_ITER
             13 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             16 POP_TOP
             17 LOAD_CONST               5 (None)
             20 RETURN_VALUE
>>>

I haven't worked with python3 or checked the changes. It seems like the list comprehension implementation has been changed. In python3.3 there is MAKE_FUNCTION and CALL_FUNCTION. (Now in python2.7 call to a function is costly. I am not sure whether in python3.3 call to function still costly or not. if that is case then that could add some time.)

like image 142
Ansuman Bebarta Avatar answered Oct 06 '22 14:10

Ansuman Bebarta


You're not comparing apples to apples.

In Python 2, str is what is called bytes in Python 3 (almost).

In Python 3, str is what is called unicode in Python 2.

like image 31
Mike Graham Avatar answered Oct 06 '22 13:10

Mike Graham