Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python string join performance

There are a lot of articles around the web concerning Python performance. The first thing you read is concatenating strings should not be done using '+'; avoid s1 + s2 + s3, and instead use str.join

I tried the following: concatenating two strings as part of a directory path: three approaches:

  1. '+' which I should not do
  2. str.join
  3. os.path.join

Here is my code:

import os, time

s1 = '/part/one/of/dir'
s2 = 'part/two/of/dir'
N = 10000

t = time.clock()
for i in xrange(N):
    s = s1 + os.sep + s2
print time.clock() - t

t = time.clock()
for i in xrange(N):
    s = os.sep.join((s1, s2))
print time.clock() - t

t = time.clock()
for i in xrange(N):
    s = os.path.join(s1, s2)
print time.clock() - t

Here the results (Python 2.5 on Windows XP):

0.0182201927899
0.0262544541275
0.120238186697

Shouldn't it be exactly the other way around?

like image 813
Danny Avatar asked Jan 24 '09 22:01

Danny


People also ask

Is Join faster than concatenation Python?

String join is significantly faster then concatenation. Why? Strings are immutable and can't be changed in place. To alter one, a new representation needs to be created (a concatenation of the two).

What is the most efficient way to concatenate many strings together in Python?

Practical Data Science using Python The best way of appending a string to a string variable is to use + or +=. This is because it's readable and fast. They are also just as fast.

Is concatenation faster than join?

Doing N concatenations requires creating N new strings in the process. join() , on the other hand, only has to create a single string (the final result) and thus works much faster.

What is the complexity of string concatenation in Python?

The time complexity of string concatenation using + is O(n²)


1 Answers

Most of the performance issues with string concatenation are ones of asymptotic performance, so the differences become most significant when you are concatenating many long strings.

In your sample, you are performing the same concatenation many times. You aren't building up any long string, and it may be that the Python interpreter is optimizing your loops. This would explain why the time increases when you move to str.join and path.join - they are more complex functions that are not as easily reduced. (os.path.join does a lot of checking on the strings to see if they need to be rewritten in any way before they are concatenated. This sacrifices some performance for the sake of portability.)

By the way, since file paths are not usually very long, you almost certainly want to use os.path.join for the sake of the portability. If the performance of the concatenation is a problem, you're doing something very odd with your filesystem.

like image 87
user57368 Avatar answered Sep 22 '22 17:09

user57368