I want an efficient way to append one string to another in Python, other than the following. <pre class="prettyprint"><code>var1 = "foo" var2 = "bar" var3 = var1 + var2 </code></pre> Is there any good built-in method to use?

Don't prematurely optimize. If you have no reason to believe there's a speed bottleneck caused by string concatenations then just stick with <code>+</code> and <code>+=</code>: <pre class="prettyprint"><code>s = 'foo' s += 'bar' s += 'baz' </code></pre> That said, if you're aiming for something like Java's StringBuilder, the canonical Python idiom is to add items to a list and then use <code>str.join</code> to concatenate them all at the end: <pre class="prettyprint"><code>l = [] l.append('foo') l.append('bar') l.append('baz') s = ''.join(l) </code></pre>

How do I append one string to another in Python?

Tags:

python

string

append

I want an efficient way to append one string to another in Python, other than the following.

var1 = "foo" var2 = "bar" var3 = var1 + var2

Is there any good built-in method to use?

564

asked Dec 14 '10 01:12

user469652

2 Answers

If you only have one reference to a string and you concatenate another string to the end, CPython now special cases this and tries to extend the string in place.

The end result is that the operation is amortized O(n).

e.g.

s = "" for i in range(n):     s+=str(i)

used to be O(n^2), but now it is O(n).

From the source (bytesobject.c):

void PyBytes_ConcatAndDel(register PyObject **pv, register PyObject *w) {     PyBytes_Concat(pv, w);     Py_XDECREF(w); }   /* The following function breaks the notion that strings are immutable:    it changes the size of a string.  We get away with this only if there    is only one module referencing the object.  You can also think of it    as creating a new string object and destroying the old one, only    more efficiently.  In any case, don't use this if the string may    already be known to some other part of the code...    Note that if there's not enough memory to resize the string, the original    string object at *pv is deallocated, *pv is set to NULL, an "out of    memory" exception is set, and -1 is returned.  Else (on success) 0 is    returned, and the value in *pv may or may not be the same as on input.    As always, an extra byte is allocated for a trailing \0 byte (newsize    does *not* include that), and a trailing \0 byte is stored. */  int _PyBytes_Resize(PyObject **pv, Py_ssize_t newsize) {     register PyObject *v;     register PyBytesObject *sv;     v = *pv;     if (!PyBytes_Check(v) || Py_REFCNT(v) != 1 || newsize < 0) {         *pv = 0;         Py_DECREF(v);         PyErr_BadInternalCall();         return -1;     }     /* XXX UNREF/NEWREF interface should be more symmetrical */     _Py_DEC_REFTOTAL;     _Py_ForgetReference(v);     *pv = (PyObject *)         PyObject_REALLOC((char *)v, PyBytesObject_SIZE + newsize);     if (*pv == NULL) {         PyObject_Del(v);         PyErr_NoMemory();         return -1;     }     _Py_NewReference(*pv);     sv = (PyBytesObject *) *pv;     Py_SIZE(sv) = newsize;     sv->ob_sval[newsize] = '\0';     sv->ob_shash = -1;          /* invalidate cached hash value */     return 0; }

It's easy enough to verify empirically.

 $ python -m timeit -s"s=''" "for i in xrange(10):s+='a'" 1000000 loops, best of 3: 1.85 usec per loop $ python -m timeit -s"s=''" "for i in xrange(100):s+='a'" 10000 loops, best of 3: 16.8 usec per loop $ python -m timeit -s"s=''" "for i in xrange(1000):s+='a'" 10000 loops, best of 3: 158 usec per loop $ python -m timeit -s"s=''" "for i in xrange(10000):s+='a'" 1000 loops, best of 3: 1.71 msec per loop $ python -m timeit -s"s=''" "for i in xrange(100000):s+='a'" 10 loops, best of 3: 14.6 msec per loop $ python -m timeit -s"s=''" "for i in xrange(1000000):s+='a'" 10 loops, best of 3: 173 msec per loop

It's important however to note that this optimisation isn't part of the Python spec. It's only in the cPython implementation as far as I know. The same empirical testing on pypy or jython for example might show the older O(n**2) performance .

 $ pypy -m timeit -s"s=''" "for i in xrange(10):s+='a'" 10000 loops, best of 3: 90.8 usec per loop $ pypy -m timeit -s"s=''" "for i in xrange(100):s+='a'" 1000 loops, best of 3: 896 usec per loop $ pypy -m timeit -s"s=''" "for i in xrange(1000):s+='a'" 100 loops, best of 3: 9.03 msec per loop $ pypy -m timeit -s"s=''" "for i in xrange(10000):s+='a'" 10 loops, best of 3: 89.5 msec per loop

So far so good, but then,

 $ pypy -m timeit -s"s=''" "for i in xrange(100000):s+='a'" 10 loops, best of 3: 12.8 sec per loop

ouch even worse than quadratic. So pypy is doing something that works well with short strings, but performs poorly for larger strings.

123

answered Sep 22 '22 05:09

John La Rooy

Don't prematurely optimize. If you have no reason to believe there's a speed bottleneck caused by string concatenations then just stick with + and +=:

s  = 'foo' s += 'bar' s += 'baz'

That said, if you're aiming for something like Java's StringBuilder, the canonical Python idiom is to add items to a list and then use str.join to concatenate them all at the end:

l = [] l.append('foo') l.append('bar') l.append('baz')  s = ''.join(l)

answered Sep 18 '22 05:09

John Kugelman

Related questions
                            
                                How to print instances of a class using print()?
                            
                                Asking the user for input until they give a valid response
                            
                                Difference between Python's Generators and Iterators
                            
                                How can I represent an infinite number in Python?
                            
                                What's the idiomatic syntax for prepending to a short python list?
                            
                                Remove final character from string
                            
                                Call a function from another file?
                            
                                How do I do a case-insensitive string comparison?
                            
                                What is the quickest way to HTTP GET in Python?
                            
                                Python int to binary string?
                            
                                Is there a portable way to get the current username in Python?
                            
                                Get current time in milliseconds in Python?
                            
                                Dump a NumPy array into a csv file
                            
                                How can the Euclidean distance be calculated with NumPy?
                            
                                How to create a zip archive of a directory?
                            
                                Pandas Merging 101
                            
                                TypeError: 'module' object is not callable
                            
                                How to make a python script wait for a pressed key?
                            
                                How does collections.defaultdict work?
                            
                                Python try-else

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With