Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Set vs. frozenset performance

I was tinkering around with Python's set and frozenset collection types.

Initially, I assumed that frozenset would provide a better lookup performance than set, as its immutable and thus could exploit the structure of the stored items.

However, this does not seem to be the case, regarding the following experiment:

import random import time import sys  def main(n):     numbers = []     for _ in xrange(n):         numbers.append(random.randint(0, sys.maxint))     set_ = set(numbers)     frozenset_ = frozenset(set_)      start = time.time()     for number in numbers:         number in set_     set_duration = time.time() - start      start = time.time()     for number in numbers:         number in frozenset_     frozenset_duration = time.time() - start      print "set      : %.3f" % set_duration     print "frozenset: %.3f" % frozenset_duration   if __name__ == "__main__":     n = int(sys.argv[1])     main(n) 

I executed this code using both CPython and PyPy, which gave the following results:

> pypy set.py 100000000 set      : 6.156 frozenset: 6.166  > python set.py 100000000 set      : 16.824 frozenset: 17.248 

It seems that frozenset is actually slower regarding the lookup performance, both in CPython and in PyPy. Does anybody have an idea why this is the case? I did not look into the implementations.

like image 597
Sven Hager Avatar asked Apr 11 '16 17:04

Sven Hager


People also ask

What is the difference between a set and a frozenset?

sets are mutable whereas frozensets are immutable objects. That means, we can add or remove the elements in sets whereas frozensets doesn’t allow to modify once those are created. The constructor set is used to create the sets. And the constructor frozenset is used to create the frozensets.

What is a frozenset in Python?

Python frozenset is a type of set that is immutable (can't be changed). Like set, it is an unordered collection of unique objects. The objects should be hashable (such as integers, booleans, strings, tuples). Because frozenset is immutable, elements of the frozenset remain the same after creation. In Python frozensets and sets are quite the same.

Why do we need a frozenset?

The reason to use a frozenset is that it is hashable, so you can use it as a dictionary key (or as an element of another set).

What is the difference between constructor set and constructor frozenset?

The constructor set is used to create the sets. And the constructor frozenset is used to create the frozensets. Both these constructors takes a single or no argument.


1 Answers

The frozenset and set implementations are largely shared; a set is simply a frozenset with mutating methods added, with the exact same hashtable implementation. See the Objects/setobject.c source file; the top-level PyFrozenSet_Type definition shares functions with the PySet_Type definition.

There is no optimisation for a frozenset here, as there is no need to calculate the hashes for the items in the frozenset when you are testing for membership. The item that you use to test against the set still needs to have their hash calculated, in order to find the right slot in the set hashtable so you can do an equality test.

As such, your timing results are probably off due to other processes running on your system; you measured wall-clock time, and did not disable Python garbage collection nor did you repeatedly test the same thing.

Try to run your test using the timeit module, with one value from numbers and one not in the set:

import random import sys import timeit  numbers = [random.randrange(sys.maxsize) for _ in range(10000)] set_ = set(numbers) fset = frozenset(numbers) present = random.choice(numbers) notpresent = -1 test = 'present in s; notpresent in s'  settime = timeit.timeit(     test,     'from __main__ import set_ as s, present, notpresent') fsettime = timeit.timeit(     test,     'from __main__ import fset as s, present, notpresent')  print('set      : {:.3f} seconds'.format(settime)) print('frozenset: {:.3f} seconds'.format(fsettime)) 

This repeats each test 1 million times and produces:

set      : 0.050 seconds frozenset: 0.050 seconds 
like image 50
Martijn Pieters Avatar answered Sep 30 '22 20:09

Martijn Pieters