I am testing the speed of different speed concatenation methods by concatenating the string representation from 1 to a large number(in my case, 20000000). The three methods that I am testing are:
import cProfile
count = 20000000
def profileWrapper(f):
def wrapper(*args, **argv):
pr = cProfile.Profile()
pr.enable()
string = f(*args, **argv)
pr.create_stats()
pr.print_stats()
return string
return wrapper
@profileWrapper
def naiveConcat():
global count
string = ''
for i in xrange(count):
string += `i`
return string
@profileWrapper
def improvedConcat():
global count
string = []
for i in xrange(count):
string.append(`i`)
return ''.join(string)
@profileWrapper
def fastestConcat():
global count
return ''.join([`i` for i in xrange(count)])
print 15 * "=", "naiveConcat", 15 * "="
naiveConcat()
print 15 * "=", "improvedConcat", 15 * "="
improvedConcat()
print 15 * "=", "fastestConcat", 15 * "="
fastestConcat()
I am expecting to see the improved method to be faster than the naive one, and it shouldn't' be much slower than the fastest method, but the result doesn't seem like that:
=============== naiveConcat ===============
3 function calls in 3.951 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 cProfile.py:90(create_stats)
1 3.951 3.951 3.951 3.951 str_concat.py:18(naiveConcat)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
=============== improvedConcat ===============
20000004 function calls in 6.990 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 cProfile.py:90(create_stats)
1 5.196 5.196 6.990 6.990 str_concat.py:26(improvedConcat)
20000000 1.322 0.000 1.322 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.473 0.473 0.473 0.473 {method 'join' of 'str' objects}
=============== fastestConcat ===============
4 function calls in 3.043 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 cProfile.py:90(create_stats)
1 2.539 2.539 3.043 3.043 str_concat.py:34(fastestConcat)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.504 0.504 0.504 0.504 {method 'join' of 'str' objects}
The improved method turned out to be much slower even than the naive method!
This doesn't make sense since the naive method are creating new binding and copying the previous string along with the concatenated string character by character ON EACH ITERATION, this method should take O(n^2), which should be much slower than the other methods which are O(n).
So what makes the improved method so slow? The only reason that I can think of is the append method, but according to this article, the append method takes O(1), so it's definitely not the reason. Then what takes so long in improvedConcat()? Thank you.
The ncalls column of improvedConcat shows that you made 20000004 function calls, compared to just a few for the other algorithms. Function calls aren't super cheap in Python, so try to keep those limited. To demonstrate, I made a new test following your pattern called nop, which merely calls an empty function definition:
def foo():
pass
@profileWrapper
def nop():
for i in xrange(count):
foo()
return ''
I get similar timing results to your other test, and this NOP (No Operation) test results in
=============== nop ===============
20000003 function calls in 4.386 seconds
So you've got 4.4s of overhead making function calls.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With