I'm trying to optimize an algorithm in Python purely for fun / curiosity. I have a list that I'm constantly adding items and removing items from. I know that the way Python lists are implemented, Python will relocate the list in memory for you depending on its size. For example, if you have a list with 10 members, the 10 pointers will be stored consecutively in memory, but there may not be room for 100 consecutive pointers, since another program might be taking a memory block that's in the way. So as you add more members to the list, at some point Python will relocate the entire list to a different place in memory where there's more room for the list to expand.
I'm curious to know whether there is a custom data structure in Python that behaves like a list, but allows me to avoid the performance cost of doing a relocation. I'm expecting that this data type will ask me, in advance, how many members I foresee it will have, and then it'll allocate a big contiguous space in memory so it won't need to relocate the list as it slowly grows to the number of members I specified.
(Note: I tried using a Numpy array, but I had to keep a separate "needle" variable that kept the size of the list, and I'm afraid that the overhead of maintaining that needle in Python cost more than the gain.)
“not in” operator − This operator is used to check whether an element is not present in the passed list or not. Returns true if the element is not present in the list otherwise returns false.
Just for the curiosity I implemented simple benchmark to test various methods mentioned here. As people predicted the differences are quite small. That said for larger data sets pre-allocating the list and keeping track of the size is faster but this of course is only the case when adding/removing items at the end.
Benchmark
import time
from collections import deque
def normal(size):
res = []
for x in xrange(size):
res.append(x)
def prealloc(size):
res = [0] * size
len = 0 # Separate length tracking
for x in xrange(size):
res[len] = x
len += 1
def deq(size):
res = deque()
for x in xrange(size):
res.append(x)
FUNCS = [
['list', normal],
['pre-allocate', prealloc],
['deque', deq]
]
size = 10
while size <= 10000000:
print 'size: {0}'.format(size)
for n, f in FUNCS:
start = time.clock()
for i in xrange(30):
f(size)
t = time.clock() - start
print '{}: {}'.format(n, (t / 30))
size *= 100
print ''
Results
size: 10
list: 2.13049681786e-06
pre-allocate: 2.03719038788e-06
deque: 2.00608824456e-06
size: 1000
list: 0.000104425446219
pre-allocate: 8.72415120307e-05
deque: 0.000106369330176
size: 100000
list: 0.00984092031242
pre-allocate: 0.00825001457913
deque: 0.0101434508606
size: 10000000
list: 1.23171166758
pre-allocate: 0.876017370547
deque: 1.05051393959
Tests were run with Python 2.7.10 on Windows 8.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With