Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python equivalent of Java StringBuffer?

Is there anything in Python like Java's StringBuffer? Since strings are immutable in Python too, editing them in loops would be inefficient.

like image 408
user2902773 Avatar asked Nov 12 '13 10:11

user2902773


People also ask

Does Python have a StringBuilder?

The StringBuilder class in C# programming creates mutable string objects and allows dynamic memory allocation. We don't have such calss in Python, but we can use string concatenation to achieve this and create long efficient string objects.

What can I use instead of StringBuffer?

Actually for the example above you should use StringBuilder (introduced in Java 1.5) instead of StringBuffer - StringBuffer is little heavier as all its methods are synchronized.

Does Python have mutable strings?

In Python, strings are made immutable so that programmers cannot alter the contents of the object (even by mistake).


1 Answers

Python 3

From the docs:

Concatenating immutable sequences always results in a new object. This means that building up a sequence by repeated concatenation will have a quadratic runtime cost in the total sequence length. To get a linear runtime cost, you must switch to one of the alternatives below: if concatenating str objects, you can build a list and use str.join() at the end or else write to an io.StringIO instance and retrieve its value when complete

Experiment to compare runtime of several options:

import sys import timeit from io import StringIO from array import array   def test_concat():     out_str = ''     for _ in range(loop_count):         out_str += 'abc'     return out_str   def test_join_list_loop():     str_list = []     for _ in range(loop_count):         str_list.append('abc')     return ''.join(str_list)   def test_array():     char_array = array('b')     for _ in range(loop_count):         char_array.frombytes(b'abc')     return str(char_array.tostring())   def test_string_io():     file_str = StringIO()     for _ in range(loop_count):         file_str.write('abc')     return file_str.getvalue()   def test_join_list_compr():     return ''.join(['abc' for _ in range(loop_count)])   def test_join_gen_compr():     return ''.join('abc' for _ in range(loop_count))   loop_count = 80000  print(sys.version)  res = {}  for k, v in dict(globals()).items():     if k.startswith('test_'):         res[k] = timeit.timeit(v, number=10)  for k, v in sorted(res.items(), key=lambda x: x[1]):     print('{:.5f} {}'.format(v, k)) 

results

3.7.5 (default, Nov  1 2019, 02:16:32)  [Clang 11.0.0 (clang-1100.0.33.8)] 0.03738 test_join_list_compr 0.05681 test_join_gen_compr 0.09425 test_string_io 0.09636 test_join_list_loop 0.11976 test_concat 0.19267 test_array 

Python 2

Efficient String Concatenation in Python is a rather old article and its main statement that the naive concatenation is far slower than joining is not valid anymore, because this part has been optimized in CPython since then. From the docs:

CPython implementation detail: If s and t are both strings, some Python implementations such as CPython can usually perform an in-place optimization for assignments of the form s = s + t or s += t. When applicable, this optimization makes quadratic run-time much less likely. This optimization is both version and implementation dependent. For performance sensitive code, it is preferable to use the str.join() method which assures consistent linear concatenation performance across versions and implementations.

I've adapted their code a bit and got the following results on my machine:

from cStringIO import StringIO from UserString import MutableString from array import array  import sys, timeit  def method1():     out_str = ''     for num in xrange(loop_count):         out_str += `num`     return out_str  def method2():     out_str = MutableString()     for num in xrange(loop_count):         out_str += `num`     return out_str  def method3():     char_array = array('c')     for num in xrange(loop_count):         char_array.fromstring(`num`)     return char_array.tostring()  def method4():     str_list = []     for num in xrange(loop_count):         str_list.append(`num`)     out_str = ''.join(str_list)     return out_str  def method5():     file_str = StringIO()     for num in xrange(loop_count):         file_str.write(`num`)     out_str = file_str.getvalue()     return out_str  def method6():     out_str = ''.join([`num` for num in xrange(loop_count)])     return out_str  def method7():     out_str = ''.join(`num` for num in xrange(loop_count))     return out_str   loop_count = 80000  print sys.version  print 'method1=', timeit.timeit(method1, number=10) print 'method2=', timeit.timeit(method2, number=10) print 'method3=', timeit.timeit(method3, number=10) print 'method4=', timeit.timeit(method4, number=10) print 'method5=', timeit.timeit(method5, number=10) print 'method6=', timeit.timeit(method6, number=10) print 'method7=', timeit.timeit(method7, number=10) 

Results:

2.7.1 (r271:86832, Jul 31 2011, 19:30:53)  [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] method1= 0.171155929565 method2= 16.7158739567 method3= 0.420584917068 method4= 0.231794118881 method5= 0.323612928391 method6= 0.120429992676 method7= 0.145267963409 

Conclusions:

  • join still wins over concat, but marginally
  • list comprehensions are faster than loops (when building a list)
  • joining generators is slower than joining lists
  • other methods are of no use (unless you're doing something special)
like image 183
5 revs, 4 users 65% Avatar answered Oct 09 '22 03:10

5 revs, 4 users 65%