Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I append one string to another in Python?

I want an efficient way to append one string to another in Python, other than the following.

var1 = "foo" var2 = "bar" var3 = var1 + var2 

Is there any good built-in method to use?

like image 564
user469652 Avatar asked Dec 14 '10 01:12

user469652


People also ask

How do I append a string?

Concatenation is the process of appending one string to the end of another string. You concatenate strings by using the + operator. For string literals and string constants, concatenation occurs at compile time; no run-time concatenation occurs. For string variables, concatenation occurs only at run time.

How do I concatenate two strings in a list in Python?

You can concatenate a list of strings into a single string with the string method, join() . Call the join() method from 'String to insert' and pass [List of strings] . If you use an empty string '' , [List of strings] is simply concatenated, and if you use a comma , , it makes a comma-delimited string.


2 Answers

If you only have one reference to a string and you concatenate another string to the end, CPython now special cases this and tries to extend the string in place.

The end result is that the operation is amortized O(n).

e.g.

s = "" for i in range(n):     s+=str(i) 

used to be O(n^2), but now it is O(n).

From the source (bytesobject.c):

void PyBytes_ConcatAndDel(register PyObject **pv, register PyObject *w) {     PyBytes_Concat(pv, w);     Py_XDECREF(w); }   /* The following function breaks the notion that strings are immutable:    it changes the size of a string.  We get away with this only if there    is only one module referencing the object.  You can also think of it    as creating a new string object and destroying the old one, only    more efficiently.  In any case, don't use this if the string may    already be known to some other part of the code...    Note that if there's not enough memory to resize the string, the original    string object at *pv is deallocated, *pv is set to NULL, an "out of    memory" exception is set, and -1 is returned.  Else (on success) 0 is    returned, and the value in *pv may or may not be the same as on input.    As always, an extra byte is allocated for a trailing \0 byte (newsize    does *not* include that), and a trailing \0 byte is stored. */  int _PyBytes_Resize(PyObject **pv, Py_ssize_t newsize) {     register PyObject *v;     register PyBytesObject *sv;     v = *pv;     if (!PyBytes_Check(v) || Py_REFCNT(v) != 1 || newsize < 0) {         *pv = 0;         Py_DECREF(v);         PyErr_BadInternalCall();         return -1;     }     /* XXX UNREF/NEWREF interface should be more symmetrical */     _Py_DEC_REFTOTAL;     _Py_ForgetReference(v);     *pv = (PyObject *)         PyObject_REALLOC((char *)v, PyBytesObject_SIZE + newsize);     if (*pv == NULL) {         PyObject_Del(v);         PyErr_NoMemory();         return -1;     }     _Py_NewReference(*pv);     sv = (PyBytesObject *) *pv;     Py_SIZE(sv) = newsize;     sv->ob_sval[newsize] = '\0';     sv->ob_shash = -1;          /* invalidate cached hash value */     return 0; } 

It's easy enough to verify empirically.

 $ python -m timeit -s"s=''" "for i in xrange(10):s+='a'" 1000000 loops, best of 3: 1.85 usec per loop $ python -m timeit -s"s=''" "for i in xrange(100):s+='a'" 10000 loops, best of 3: 16.8 usec per loop $ python -m timeit -s"s=''" "for i in xrange(1000):s+='a'" 10000 loops, best of 3: 158 usec per loop $ python -m timeit -s"s=''" "for i in xrange(10000):s+='a'" 1000 loops, best of 3: 1.71 msec per loop $ python -m timeit -s"s=''" "for i in xrange(100000):s+='a'" 10 loops, best of 3: 14.6 msec per loop $ python -m timeit -s"s=''" "for i in xrange(1000000):s+='a'" 10 loops, best of 3: 173 msec per loop 

It's important however to note that this optimisation isn't part of the Python spec. It's only in the cPython implementation as far as I know. The same empirical testing on pypy or jython for example might show the older O(n**2) performance .

 $ pypy -m timeit -s"s=''" "for i in xrange(10):s+='a'" 10000 loops, best of 3: 90.8 usec per loop $ pypy -m timeit -s"s=''" "for i in xrange(100):s+='a'" 1000 loops, best of 3: 896 usec per loop $ pypy -m timeit -s"s=''" "for i in xrange(1000):s+='a'" 100 loops, best of 3: 9.03 msec per loop $ pypy -m timeit -s"s=''" "for i in xrange(10000):s+='a'" 10 loops, best of 3: 89.5 msec per loop 

So far so good, but then,

 $ pypy -m timeit -s"s=''" "for i in xrange(100000):s+='a'" 10 loops, best of 3: 12.8 sec per loop 

ouch even worse than quadratic. So pypy is doing something that works well with short strings, but performs poorly for larger strings.

like image 123
John La Rooy Avatar answered Sep 22 '22 05:09

John La Rooy


Don't prematurely optimize. If you have no reason to believe there's a speed bottleneck caused by string concatenations then just stick with + and +=:

s  = 'foo' s += 'bar' s += 'baz' 

That said, if you're aiming for something like Java's StringBuilder, the canonical Python idiom is to add items to a list and then use str.join to concatenate them all at the end:

l = [] l.append('foo') l.append('bar') l.append('baz')  s = ''.join(l) 
like image 23
John Kugelman Avatar answered Sep 18 '22 05:09

John Kugelman