Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python sets versus arrays

I need to store a large list of numbers in memory. I will then need to check for membership. Arrays are better than lists for memory efficiency. Sets are better than lists for membership checking. I need both! So my questions are:

1) How much more memory efficient are arrays than sets? (For the converse, see my results below). 2) Is there a data structure which strikes a better balance between sets and arrays? Something like a set with a signed integer type? Or some numpy construct?

I checked out the membership timing difference with the script below. (I know timeit is better but the variance is low enough for time to be fine):

import array
import time 

class TimerContext:
    def __enter__(self):
        self.t0 = time.time()
    def __exit__(self, *args, **kwargs):
        print(time.time()-self.t0)

SIZE = 1000000

l = list([i for i in range(SIZE)])
a = array.array('I', l)
s = set(l)

print(type(l))
print(type(a))
print(type(s))

with TimerContext():
    x = 99999 in l
with TimerContext():
    x = 99999 in a
with TimerContext():
    x = 99999 in s

Results:

<class 'list'>
<class 'array.array'>
<class 'set'>
0.0012176036834716797
0.0024595260620117188
1.430511474609375e-06

So sets are a LOT faster for membership checking (please note the scientific notation). So if their memory footprint isn't that different to the array, I'll prefer to use a set. But I don't know how to check the memory footprint.

I should also add that there are a lot of questions comparing sets and lists. But I didn't see any good answers comparing arrays and sets.

like image 977
Neil Avatar asked Apr 19 '19 18:04

Neil


People also ask

What is the difference between array and set in Python?

Difference Between Array and SetArray is faster than set in terms of initialization. Set is slower than an array in terms of initialization because it uses a hash process. The array allows to store duplicate elements in it. Set doesn't allow to store duplicate elements in it.

Is set faster than array Python?

Generally the lists are faster than sets. But in the case of searching for an element in a collection, sets are faster because sets have been implemented using hash tables. So basically Python does not have to search the full set, which means that the time complexity in average is O(1).

Are sets better than lists Python?

Lists are slightly faster than sets when you just want to iterate over the values. Sets, however, are significantly faster than lists if you want to check if an item is contained within it. They can only contain unique items though.

What is the difference between set and tuples in Python?

The tuples refer to the collections of various objects of Python separated by commas between them. The sets are an unordered collection of data types. These are mutable, iterable, and do not consist of any duplicate elements.


Video Answer


1 Answers

If it's possible in your case, bisect performance comes close to set for membership checks (both with list and array). See results below

import array
from bisect import bisect
import sys
import time


class TimerContext:
    def __enter__(self):
        self.t0 = time.time()

    def __exit__(self, *args, **kwargs):
        print(time.time() - self.t0)


def get_size_in_megabytes(iterable):
    return round(sys.getsizeof(iterable) / (1024 ** 2), 2)


SIZE = 1000000

l = list([i for i in range(SIZE)])
a = array.array("I", l)
s = set(l)

print(type(l), get_size_in_megabytes(l))
print(type(a), get_size_in_megabytes(a))
print(type(s), get_size_in_megabytes(s))

with TimerContext():
    x = 99999 in l
with TimerContext():
    x = 99999 in a
with TimerContext():
    x = 99999 in s

print("list bisect")
with TimerContext():
    bisect(l, 99999)

print("array bisect")
with TimerContext():
    bisect(a, 99999)

Results:

<class 'list'> 8.58
<class 'array.array'> 3.81
<class 'set'> 32.0
0.0024390220642089844
0.0053005218505859375
3.814697265625e-06
list bisect
9.298324584960938e-06
array bisect
6.198883056640625e-06

Credits for sys.getsizeof ussage to @CristiFati.

like image 134
Dušan Maďar Avatar answered Oct 18 '22 20:10

Dušan Maďar