Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Avoiding unnecessary slice copying in Python

Is there a common idiom for avoiding pointless slice copying for cases like this:

>>> a = bytearray(b'hello')
>>> b = bytearray(b'goodbye, cruel world.')
>>> a.extend(b[14:20])
>>> a
bytearray(b'hello world')

It seems to me that there is an unnecessary copy happening when the b[14:20] slice is created. Rather than create a new slice in memory to give to extend I want to say "use only this range of the current object".

Some methods will help you out with slice parameters, for example count:

>>> a = bytearray(1000000)       # a million zero bytes
>>> a[0:900000].count(b'\x00')   # expensive temporary slice
900000
>>> a.count(b'\x00', 0, 900000)  # helpful start and end parameters
900000

but many, like extend in my first example, don't have this feature.

I realise that for many applications what I'm talking about would be a micro-optimisation, so before anyone asks - yes, I have profiled my application, and it is something worth worrying about for my case.

I have one 'solution' below, but any better ideas are most welcome.

like image 782
Scott Griffiths Avatar asked Feb 24 '10 17:02

Scott Griffiths


2 Answers

Creating a buffer object avoids copying the slice, but for short slices it's more efficient to just make the copy:

>>> a.extend(buffer(b, 14, 6))
>>> a
bytearray(b'hello world')

Here there's only one copy made of the memory, but the cost of creating the buffer object more than obliterates the saving. It should be better for larger slices though. I'm not sure how large the slice would have to be for this method to be more efficient overall.

Note that for Python 3 (and optionally in Python 2.7) you'd need a memoryview object instead:

>>> a.extend(memoryview(b)[14:20])
like image 98
Scott Griffiths Avatar answered Oct 11 '22 08:10

Scott Griffiths


itertools has islice. islice doesn't have a count method so it is useful in other cases where you wish to avoid copying the slice. As you pointed out - count has a mechanism for that anyway

>>> from itertools import islice
>>> a = bytearray(1000000)
>>> sum(1 for x in islice(a,0,900000) if x==0)
900000
>>> len(filter(b'\x00'.__eq__,islice(a,0,900000)))
900000

>>> a=bytearray(b"hello")
>>> b = bytearray(b'goodbye, cruel world.')
>>> a.extend(islice(b,14,20))
>>> a
bytearray(b'hello world')
like image 28
John La Rooy Avatar answered Oct 11 '22 07:10

John La Rooy