Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python cStringIO take more time than StringIO in writing (performance of string methods)

In my way to profile string methods in python so that I can use the fastest one. I have this code to test string concatenation in files, StringIO, StringIO and normal string.

#!/usr/bin/env python
#title           : pythonTiming.py
#description     : Will be used to test timing function in python
#author          : myusuf
#date            : 19-11-2014
#version         : 0
#usage           :python pythonTiming.py
#notes           :
#python_version  :2.6.6  
#==============================================================================

import time
import cStringIO
import StringIO

class Timer(object):

    def __enter__(self):
        self.start = time.time()
        return self

    def __exit__(self, *args):
        self.end = time.time()
        self.interval = self.end - self.start

testbuf = """ Hello This is a General String that will be repreated
This string will be written to a file , StringIO and a sregualr strin then see the best to handle string according to time 

""" * 1000

MyFile = open("./testfile.txt" ,"wb+")
MyStr  = ''
MyStrIo = StringIO.StringIO()
MycStrIo = cStringIO.StringIO()

def strWithFiles():
    global MyFile
    print "writing string to file "
    for index in range(1000):
        MyFile.write(testbuf) 
    pass

def strWithStringIO():
    global MyStrIo
    print "writing string to StrinIO "
    for index in range(1000):
        MyStrIo.write(testbuf)

def strWithStr():
    global MyStr
    print "Writing String to STR "
    for index in range(500):
        MyStr =  MyStr +  testbuf

def strWithCstr():
    global MycStrIo
    print "writing String to Cstring"
    for index in range(1000):
        MycStrIo.write(testbuf)

with Timer() as t:
    strWithFiles()
print('##Request took %.03f sec.' % t.interval)

with Timer() as t:                                                                                
    strWithStringIO()
print('###Request took %.03f sec.' % t.interval)  

with Timer() as t:                                                                                
    strWithCstr()
print('####Request took %.03f sec.' % t.interval)  

with Timer() as t:
    read1 = 'x' + MyFile.read(-1)
print('file read ##Request took %.03f sec.' % t.interval)

with Timer() as t:
    read2 = 'x' + MyStrIo.read(-1)
print('stringIo read ###Request took %.03f sec.' % t.interval)

with Timer() as t:
    read3 = 'x' + MycStrIo.read(-1)
print('CString read ####Request took %.03f sec.' % t.interval)




MyFile.close()
  1. While the Python documentation site says that cStringIO is faster than StringIO but the results says that StringIO has better performance in concatenation, why?

  2. The other hand is that, reading from cStringIO is faster than StringIO (its behavior similar to file), as I read the implementation of file and cStringIO are in C, so why string concatenation is slow?

  3. Is there any other way to deal with string more faster than these methods?

like image 504
Muhammad Yusuf Avatar asked Nov 19 '14 07:11

Muhammad Yusuf


1 Answers

The reason that StringIO performs better is behind the scenes it just keeps a list of all the strings that have been written to it, and only combines them when necessary. So a write operation is as simple as appending an object to a list. However, the cStringIO module does not have this luxury and must copy over the data of each string into its buffer, resizing its buffer as and when necessary (which creates much redundant copying of data when writing large amounts of data).

Since you are writing lots of larger strings, this means there is less work for StringIO to do in comparison to cStringIO. When reading from a StringIO object you have written to, it can optmise the amount of copying needed by computing the sum of the lengths of the strings written to it preallocating a buffer of that size.

However, StringIO is not the fastest way of joining a series of strings. This is because it provides additional functionality (seeking to different parts of the buffer and writing data there). If this functionality is not needed all you want to do is join a list strings together, then str.join is the fastest way to do this.

joined_string = "".join(testbuf for index in range(1000))
# or building the list of strings to join separately
strings = []
for i in range(1000):
    strings.append(testbuf)
joined_string = "".join(strings)
like image 114
Dunes Avatar answered Sep 20 '22 16:09

Dunes