Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dictionary, set or frozenset?

I have a large collection of data, about 10 million entries and part of my program required very many membership checks...

if a in data:
    return True
return False

right now I have data as dictionary entries with all their values equal to '1'

I also have a program that uses an algorithm to figure out the same information, but for now it is slower then the dictionary method however I expect the size of data to continue growing...

For my current dictionary solution, would type(data) as a frozenset, or set (or something else?) be faster?

And for the future to find out when I need to switch to my program, does anyone know how the speed of checking membership correlated with increasing the size of a type that's hashable? Is a dictionary with 1 billion entries still fast?

like image 302
lonewarrior556 Avatar asked Nov 16 '13 07:11

lonewarrior556


People also ask

What is the difference between set and Frozenset?

Frozenset is similar to set in Python, except that frozensets are immutable, which implies that once generated, elements from the frozenset cannot be added or removed. This function accepts any iterable object as input and transforms it into an immutable object.

Can a Frozenset be a key for dictionary?

The frozenset type is immutable and hashable — its contents cannot be altered after it is created; it can therefore be used as a dictionary key or as an element of another set.

Why is a Frozenset () different from a regular set?

Python frozenset() It is immutable and it is hashable. It is also called an immutable set. Since the elements are fixed, unlike sets you can't add or remove elements from the set. Frozensets are hashable, you can use the elements as a dictionary key or as an element from another set.

Can dictionary key be set?

Second, a dictionary key must be of a type that is immutable. For example, you can use an integer, float, string, or Boolean as a dictionary key. However, neither a list nor another dictionary can serve as a dictionary key, because lists and dictionaries are mutable.


2 Answers

On Principal

If you expect the data to keep growing you can't use a frozenset.

A set would be smaller than a dictionary storage wise for testing if an element exist in it. It would be similar in speed to a dictionary lookup since the keys and items of a set are both hashed for storage and always unique. If you don't need data associated with the username, use a set.

Practically speaking...

When you are dealing with that many entries move the data to a database. You will eventually run out of memory trying to store and read all of that into memory. With a database, you can issue a specific query to check membership. Seriously. Put that data in a database.

like image 93
RyPeck Avatar answered Oct 02 '22 08:10

RyPeck


For this amount of data RyPeck is right - a DB will do the job much better.

One more point: Something seems odd to me in what you've written: If you use a dictionary to store the objects of the memberships, what the value of said key-value pair in the dictionary is '1'? Shouldn't the key-value pair of the dictionary be: "id of a"-"a" where 'a' is the object.

like image 32
yuvalb9 Avatar answered Oct 02 '22 10:10

yuvalb9