Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python string concatenation speed

I am testing the speed of different speed concatenation methods by concatenating the string representation from 1 to a large number(in my case, 20000000). The three methods that I am testing are:

import cProfile

count = 20000000

def profileWrapper(f):
    def wrapper(*args, **argv):
        pr = cProfile.Profile()
        pr.enable()
        string = f(*args, **argv)
        pr.create_stats()
        pr.print_stats()
        return string
    return wrapper

@profileWrapper
def naiveConcat():
    global count
    string = ''
    for i in xrange(count):
        string += `i`
    return string

@profileWrapper
def improvedConcat():
    global count
    string = []
    for i in xrange(count):
        string.append(`i`)
    return ''.join(string)

@profileWrapper
def fastestConcat():
    global count
    return ''.join([`i` for i in xrange(count)])

print 15 * "=", "naiveConcat", 15 * "="
naiveConcat()
print 15 * "=", "improvedConcat", 15 * "="
improvedConcat()
print 15 * "=", "fastestConcat", 15 * "="
fastestConcat()

I am expecting to see the improved method to be faster than the naive one, and it shouldn't' be much slower than the fastest method, but the result doesn't seem like that:

=============== naiveConcat ===============
         3 function calls in 3.951 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 cProfile.py:90(create_stats)
        1    3.951    3.951    3.951    3.951 str_concat.py:18(naiveConcat)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


=============== improvedConcat ===============
         20000004 function calls in 6.990 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 cProfile.py:90(create_stats)
        1    5.196    5.196    6.990    6.990 str_concat.py:26(improvedConcat)
 20000000    1.322    0.000    1.322    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.473    0.473    0.473    0.473 {method 'join' of 'str' objects}


=============== fastestConcat ===============
         4 function calls in 3.043 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 cProfile.py:90(create_stats)
        1    2.539    2.539    3.043    3.043 str_concat.py:34(fastestConcat)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.504    0.504    0.504    0.504 {method 'join' of 'str' objects}

The improved method turned out to be much slower even than the naive method!

This doesn't make sense since the naive method are creating new binding and copying the previous string along with the concatenated string character by character ON EACH ITERATION, this method should take O(n^2), which should be much slower than the other methods which are O(n).

So what makes the improved method so slow? The only reason that I can think of is the append method, but according to this article, the append method takes O(1), so it's definitely not the reason. Then what takes so long in improvedConcat()? Thank you.

like image 998
elgoog Avatar asked Nov 02 '22 05:11

elgoog


1 Answers

The ncalls column of improvedConcat shows that you made 20000004 function calls, compared to just a few for the other algorithms. Function calls aren't super cheap in Python, so try to keep those limited. To demonstrate, I made a new test following your pattern called nop, which merely calls an empty function definition:

def foo():
    pass

@profileWrapper
def nop():
    for i in xrange(count):
        foo()
    return ''

I get similar timing results to your other test, and this NOP (No Operation) test results in

=============== nop ===============
20000003 function calls in 4.386 seconds

So you've got 4.4s of overhead making function calls.

like image 113
Joseph Sheedy Avatar answered Nov 15 '22 04:11

Joseph Sheedy