Due to this, mixing the StringBuilder and + method of concatenation is considered bad practice. Additionally, String concatenation using the + operator within a loop should be avoided. Since the String object is immutable, each call for concatenation will result in a new String object being created.
Concatenation is the process of appending one string to the end of another string. You concatenate strings by using the + operator. For string literals and string constants, concatenation occurs at compile time; no run-time concatenation occurs. For string variables, concatenation occurs only at run time.
Concat() or + is the best way.
If you concatenate Stings in loops for each iteration a new intermediate object is created in the String constant pool. This is not recommended as it causes memory issues.
There is nothing wrong in concatenating two strings with +
. Indeed it's easier to read than ''.join([a, b])
.
You are right though that concatenating more than 2 strings with +
is an O(n^2) operation (compared to O(n) for join
) and thus becomes inefficient. However this has not to do with using a loop. Even a + b + c + ...
is O(n^2), the reason being that each concatenation produces a new string.
CPython2.4 and above try to mitigate that, but it's still advisable to use join
when concatenating more than 2 strings.
Plus operator is perfectly fine solution to concatenate two Python strings. But if you keep adding more than two strings (n > 25) , you might want to think something else.
''.join([a, b, c])
trick is a performance optimization.
The assumption that one should never, ever use + for string concatenation, but instead always use ''.join may be a myth. It is true that using +
creates unnecessary temporary copies of immutable string object but the other not oft quoted fact is that calling join
in a loop would generally add the overhead of function call
. Lets take your example.
Create two lists, one from the linked SO question and another a bigger fabricated
>>> myl1 = ['A','B','C','D','E','F']
>>> myl2=[chr(random.randint(65,90)) for i in range(0,10000)]
Lets create two functions, UseJoin
and UsePlus
to use the respective join
and +
functionality.
>>> def UsePlus():
return [myl[i] + myl[i + 1] for i in range(0,len(myl), 2)]
>>> def UseJoin():
[''.join((myl[i],myl[i + 1])) for i in range(0,len(myl), 2)]
Lets run timeit with the first list
>>> myl=myl1
>>> t1=timeit.Timer("UsePlus()","from __main__ import UsePlus")
>>> t2=timeit.Timer("UseJoin()","from __main__ import UseJoin")
>>> print "%.2f usec/pass" % (1000000 * t1.timeit(number=100000)/100000)
2.48 usec/pass
>>> print "%.2f usec/pass" % (1000000 * t2.timeit(number=100000)/100000)
2.61 usec/pass
>>>
They have almost the same runtime.
Lets use cProfile
>>> myl=myl2
>>> cProfile.run("UsePlus()")
5 function calls in 0.001 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.001 0.001 0.001 0.001 <pyshell#1376>:1(UsePlus)
1 0.000 0.000 0.001 0.001 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {len}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.000 0.000 0.000 0.000 {range}
>>> cProfile.run("UseJoin()")
5005 function calls in 0.029 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.015 0.015 0.029 0.029 <pyshell#1388>:1(UseJoin)
1 0.000 0.000 0.029 0.029 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {len}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
5000 0.014 0.000 0.014 0.000 {method 'join' of 'str' objects}
1 0.000 0.000 0.000 0.000 {range}
And it looks that using Join, results in unnecessary function calls which could add to the overhead.
Now coming back to the question. Should one discourage the use of +
over join
in all cases?
I believe no, things should be taken into consideration
And off-course in a development pre-mature optimization is evil.
When working with multiple people, it's sometimes difficult to know exactly what's happening. Using a format string instead of concatenation can avoid one particular annoyance that's happened a whole ton of times to us:
Say, a function requires an argument, and you write it expecting to get a string:
In [1]: def foo(zeta):
...: print 'bar: ' + zeta
In [2]: foo('bang')
bar: bang
So, this function may be used pretty often throughout the code. Your coworkers may know exactly what it does, but not necessarily be fully up-to-speed on the internals, and may not know that the function expects a string. And so they may end up with this:
In [3]: foo(23)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/home/izkata/<ipython console> in <module>()
/home/izkata/<ipython console> in foo(zeta)
TypeError: cannot concatenate 'str' and 'int' objects
There would be no problem if you just used a format string:
In [1]: def foo(zeta):
...: print 'bar: %s' % zeta
...:
...:
In [2]: foo('bang')
bar: bang
In [3]: foo(23)
bar: 23
The same is true for all types of objects that define __str__
, which may be passed in as well:
In [1]: from datetime import date
In [2]: zeta = date(2012, 4, 15)
In [3]: print 'bar: ' + zeta
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/home/izkata/<ipython console> in <module>()
TypeError: cannot concatenate 'str' and 'datetime.date' objects
In [4]: print 'bar: %s' % zeta
bar: 2012-04-15
So yes: If you can use a format string do it and take advantage of what Python has to offer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With