In my way to profile string methods in python so that I can use the fastest one. I have this code to test string concatenation in files, StringIO, StringIO and normal string.
#!/usr/bin/env python
#title : pythonTiming.py
#description : Will be used to test timing function in python
#author : myusuf
#date : 19-11-2014
#version : 0
#usage :python pythonTiming.py
#notes :
#python_version :2.6.6
#==============================================================================
import time
import cStringIO
import StringIO
class Timer(object):
def __enter__(self):
self.start = time.time()
return self
def __exit__(self, *args):
self.end = time.time()
self.interval = self.end - self.start
testbuf = """ Hello This is a General String that will be repreated
This string will be written to a file , StringIO and a sregualr strin then see the best to handle string according to time
""" * 1000
MyFile = open("./testfile.txt" ,"wb+")
MyStr = ''
MyStrIo = StringIO.StringIO()
MycStrIo = cStringIO.StringIO()
def strWithFiles():
global MyFile
print "writing string to file "
for index in range(1000):
MyFile.write(testbuf)
pass
def strWithStringIO():
global MyStrIo
print "writing string to StrinIO "
for index in range(1000):
MyStrIo.write(testbuf)
def strWithStr():
global MyStr
print "Writing String to STR "
for index in range(500):
MyStr = MyStr + testbuf
def strWithCstr():
global MycStrIo
print "writing String to Cstring"
for index in range(1000):
MycStrIo.write(testbuf)
with Timer() as t:
strWithFiles()
print('##Request took %.03f sec.' % t.interval)
with Timer() as t:
strWithStringIO()
print('###Request took %.03f sec.' % t.interval)
with Timer() as t:
strWithCstr()
print('####Request took %.03f sec.' % t.interval)
with Timer() as t:
read1 = 'x' + MyFile.read(-1)
print('file read ##Request took %.03f sec.' % t.interval)
with Timer() as t:
read2 = 'x' + MyStrIo.read(-1)
print('stringIo read ###Request took %.03f sec.' % t.interval)
with Timer() as t:
read3 = 'x' + MycStrIo.read(-1)
print('CString read ####Request took %.03f sec.' % t.interval)
MyFile.close()
While the Python documentation site says that cStringIO
is faster than StringIO
but the results says that StringIO
has better performance in concatenation, why?
The other hand is that, reading from cStringIO
is faster than StringIO
(its behavior similar to file), as I read the implementation of file and cStringIO
are in C, so why string concatenation is slow?
Is there any other way to deal with string more faster than these methods?
The reason that StringIO
performs better is behind the scenes it just keeps a list of all the strings that have been written to it, and only combines them when necessary. So a write operation is as simple as appending an object to a list. However, the cStringIO
module does not have this luxury and must copy over the data of each string into its buffer, resizing its buffer as and when necessary (which creates much redundant copying of data when writing large amounts of data).
Since you are writing lots of larger strings, this means there is less work for StringIO
to do in comparison to cStringIO
. When reading from a StringIO
object you have written to, it can optmise the amount of copying needed by computing the sum of the lengths of the strings written to it preallocating a buffer of that size.
However, StringIO
is not the fastest way of joining a series of strings. This is because it provides additional functionality (seeking to different parts of the buffer and writing data there). If this functionality is not needed all you want to do is join a list strings together, then str.join
is the fastest way to do this.
joined_string = "".join(testbuf for index in range(1000))
# or building the list of strings to join separately
strings = []
for i in range(1000):
strings.append(testbuf)
joined_string = "".join(strings)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With