Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a list that exceeds the maximum size in Python

Tags:

python

list

According to this, The maximum size of a Python list on a 32 bit system is 536,870,912 elements.

Is there any possible way to initialize a list with bigger size than that?

Let's say:

list1 = [None]*1000000000
like image 932
Thanakron Tandavas Avatar asked Jan 29 '13 06:01

Thanakron Tandavas


People also ask

What is the maximum size of a python list?

According to the source code, the maximum size of a list is PY_SSIZE_T_MAX/sizeof(PyObject*) . On a regular 32bit system, this is (4294967295 / 2) / 4 or 536870912. Therefore the maximum size of a python list on a 32 bit system is 536,870,912 elements.

How do I set the maximum length of a list?

How to set a maximum length for a python list/set? One approach is to create custom class of list and inherit functionality from python list . Then in add (and maybe others) methods add a check for max length. @stalk you should post that as an answer.

How do you limit the size of an array in python?

To limit the values of the NumPy array ndarray to given range, use np. clip() or clip() method of ndarray . By specifying the minimum and maximum values in the argument, the out-of-range values are replaced with those values. This is useful when you want to limit the values to a range such as 0.0 ~ 1.0 or 0 ~ 255 .


1 Answers

Lists that large would take up a massive amount of space, as each element in the list would occupy at least 4 bytes, so just one list with maximum allowed elements would take up minimum 2GB of RAM1. And that's without even considering 64-bit systems2.

  1. 4 * 5.4e+8 = 2.1e+9, 2.1 GB
  2. 8 * 1.2e+18 = 9.2e+18, 9.2 EB (yes, exabytes)

However, for the sake of the question, let's assume that you have a lot of RAM.


One of the simplest options would be to create you own object to hold and handle the massive list. It would basically split the massive list into smaller lists, and then access them accordingly when needed. As there are no real benefits of subclassing list because you'd have to rewrite all of the methods anyway, you'd be better of just subclassing object and then go from there. The only thing that is essential here is to never combine or copy the sub-lists (because they are massive), so use itertools.chain to loop over the lists when necessary.

Here is an example of the most simple list-methods append, extend, get/setitem working:

import sys
from itertools import chain

class largeList(object):
    def __init__(self, mylist=None):
        self.maxSize = sys.maxsize/4
        self.src = [[]]
        self.extend(mylist if mylist is not None else [])

    def __iter__(self):
        return chain(*self.src)

    def __getitem__(self, idx):
        return self.src[int(idx/self.maxSize)][idx%self.maxSize]

    def __setitem__(self, idx, item):
        self.src[int(idx/self.maxSize)][idx%self.maxSize] = item
        # expand set/getitem to support negative indexing.

    def append(self, item):
        if len(self.src[-1]) < self.maxSize:
            self.src[-1].append(item)
        else:
            self.src.append([item])
    
    def extend(self, items):
        remainder = self.maxSize - len(self.src[-1])
        self.src[-1].extend(items[:remainder])
        for i in xrange(0, len(items[remainder:]), self.maxSize):
            self.src.append(items[remainder:][i:i+self.maxSize])

    def __len__(self):
        return sum(len(l) for l in self.src)
    
    def __str__(self):
        size = self.__len__()
        if size >= 8:
            first, last = [], []
            for i, ele in enumerate(self.__iter__()):
                if i < 3:
                    first.append(ele)
                if i >= size - 3:
                    last.append(ele)
            return str(first)[:-1] + ', ..., ' + str(last)[1:]
        return str(list(self.__iter__()))

Example use (if you have less than 4GB free RAM or you are on a 64-bit system, change sys.maxsize before trying this!):

#sys.maxsize = 1000

list1 = largeList(xrange(sys.maxsize/4 + 2))
print len(list1)
# 53687093
print list1
#[0, 1, 2, ..., 536870910, 536870911, 536870912]
print list1.src
#[[0, 1, 2 ..., 536870910], [536870911, 536870912]]
list1.extend([42, 43])
print list1
#[0, 1, 2, ..., 536870912, 42, 43]

The result: internally the lists are split into multiple lists, while they appear to be just one when working with them. More list functionalities can easily be added by adding more methods. E.g. pop, remove, insert, and index:

(...)

    def pop(self, idx):
        listidx = int(idx/self.maxSize)
        itempopped = self.src[listidx].pop(idx%self.maxSize)
        for i in xrange(listidx, len(self.src)-1):
            self.src[i].append(self.src[i+1].pop(0))
        if not self.src[-1]:
            del self.src[-1]
        return itempopped
        
    def remove(self, item):
        for i, ele in enumerate(self.__iter__()):
            if ele == item:
                self.pop(i)
                break
    
    def insert(self, idx, item):
        listidx = int(idx/self.maxSize)
        itemtoshift = self.src[listidx].pop(-1)
        self.src[listidx].insert(idx%self.maxSize, item)
        for i in xrange(listidx+1, len(self.src)-1):
            itemremoved = self.src[i].pop(-1)
            self.src[i].insert(0, itemtoshift)
            itemtoshift = itemremoved
        if len(self.src[-1]) < self.maxSize:
            self.src[-1].insert(0, itemtoshift)
        else:
            self.src.append([self.src[-1].pop(-1)])
            self.src[-2].insert(0, itemtoshift)
    
    def index(self, item):
        for i, ele in enumerate(self.__iter__()):
            if ele == item:
                return i
        return -1

Example use, continued:

#sys.maxsize = 1000

list1 = largeList(xrange(sys.maxsize/4 + 2))
list1.insert(0, 'a')
print list1
#['a', 0, 1, ..., 536870910, 536870911, 536870912]
list1.pop(2)
#1
list1.remove(536870910)
print list1.index('a')
#0
print len(list1)
#536870911
print list1
#['a', 0, 2, ..., 536870909, 536870911, 536870912]
print list.src
#[['a', 0, 2, ..., 536870909, 536870911], [536870912]]
like image 58
dwitvliet Avatar answered Oct 06 '22 18:10

dwitvliet