Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: A list that doesn't get relocated

I'm trying to optimize an algorithm in Python purely for fun / curiosity. I have a list that I'm constantly adding items and removing items from. I know that the way Python lists are implemented, Python will relocate the list in memory for you depending on its size. For example, if you have a list with 10 members, the 10 pointers will be stored consecutively in memory, but there may not be room for 100 consecutive pointers, since another program might be taking a memory block that's in the way. So as you add more members to the list, at some point Python will relocate the entire list to a different place in memory where there's more room for the list to expand.

I'm curious to know whether there is a custom data structure in Python that behaves like a list, but allows me to avoid the performance cost of doing a relocation. I'm expecting that this data type will ask me, in advance, how many members I foresee it will have, and then it'll allocate a big contiguous space in memory so it won't need to relocate the list as it slowly grows to the number of members I specified.

(Note: I tried using a Numpy array, but I had to keep a separate "needle" variable that kept the size of the list, and I'm afraid that the overhead of maintaining that needle in Python cost more than the gain.)

like image 416
Ram Rachum Avatar asked May 06 '16 12:05

Ram Rachum


People also ask

How do you check if a word is not in a list Python?

“not in” operator − This operator is used to check whether an element is not present in the passed list or not. Returns true if the element is not present in the list otherwise returns false.


1 Answers

Just for the curiosity I implemented simple benchmark to test various methods mentioned here. As people predicted the differences are quite small. That said for larger data sets pre-allocating the list and keeping track of the size is faster but this of course is only the case when adding/removing items at the end.

Benchmark

import time
from collections import deque

def normal(size):
    res = []
    for x in xrange(size):
        res.append(x)

def prealloc(size):
    res = [0] * size
    len = 0 # Separate length tracking
    for x in xrange(size):
        res[len] = x
        len += 1

def deq(size):
    res = deque()
    for x in xrange(size):
        res.append(x)

FUNCS = [
    ['list', normal],
    ['pre-allocate', prealloc],
    ['deque', deq]
]

size = 10
while size <= 10000000:
    print 'size: {0}'.format(size)
    for n, f in FUNCS:
        start = time.clock()
        for i in xrange(30):
            f(size)
        t = time.clock() - start
        print '{}: {}'.format(n, (t / 30))
    size *= 100
    print ''

Results

size: 10
list: 2.13049681786e-06
pre-allocate: 2.03719038788e-06
deque: 2.00608824456e-06

size: 1000
list: 0.000104425446219
pre-allocate: 8.72415120307e-05
deque: 0.000106369330176

size: 100000
list: 0.00984092031242
pre-allocate: 0.00825001457913
deque: 0.0101434508606

size: 10000000
list: 1.23171166758
pre-allocate: 0.876017370547
deque: 1.05051393959

Tests were run with Python 2.7.10 on Windows 8.

like image 81
niemmi Avatar answered Oct 19 '22 08:10

niemmi