Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python buffer copy speed - why is array slower than string?

I have a buffer object in C++ that inherits from std::vector<char>. I want to convert this buffer into a Python string so that I can send it out over the network via Twisted's protocol.transport.write.

Two ways I thought of doing this are (1) making a string and filling it char by char:

def scpychar(buf, n):
    s = ''
    for i in xrange(0, n):
        s += buf[i]
    return s

and (2) making a char array (since I know how big the buffer is), filling it and converting it to a string

def scpyarr(buf, n):
    a = array.array('c','0'*n)
    for i in xrange(0, n):
        a[i] = buf[i]
    return a.tostring()

I would have thought that (1) has to make a new string object every time s += buf[i] is called, and copy the contents of the old string. So I was expecting (2) to be quicker than (1). But if I test this using timeit, I find that (1) is actually about twice as fast as (2).

I was wondering if someone could explain why (1) is faster?

Bonus points for an even more efficient way to convert from a std::vector<char> to a Python string.

like image 885
Corey Avatar asked Aug 13 '13 21:08

Corey


1 Answers

CPython can sometimes optimize string += to be in-place if it can determine that no one is keeping a reference to the old string. Algorithm (1) probably triggered the optimization, so it didn't suffer the quadratic runtime it otherwise would have. However, this behavior is not guaranteed, and other Python implementations may not support it.

Try

''.join(buf)

It should offer linear-time performance on any Python implementation, unlike (1), and faster than (2).

like image 107
user2357112 supports Monica Avatar answered Oct 21 '22 01:10

user2357112 supports Monica