Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trade-off in Python dictionary key types

Say, I'm going to construct a probably large dictionary in Python 3 for in-memory operations. The dictionary keys are integers, but I'm going to read them from a file as string at first.

As far as storage and retrieval are concerned, I wonder if it matters whether I store the dictionary keys as integers themselves, or as strings.
In other words, would leaving them as integers help with hashing?

like image 764
Jeenu Avatar asked Dec 22 '15 10:12

Jeenu


People also ask

What is the type of dictionary keys in Python?

Dictionary keys must be of an immutable type. Strings and numbers are the two most commonly used data types as dictionary keys. We can also use tuples as keys but they must contain only strings, integers, or other tuples.

Which is faster dict or list?

A dictionary is 6.6 times faster than a list when we lookup in 100 items.

Can a dictionary hold different types Python?

The keys are immutable. Just like lists, the values of dictionaries can hold heterogeneous data i.e., integers, floats, strings, NaN, Booleans, lists, arrays, and even nested dictionaries. This article will provide you a clear understanding and enable you to work proficiently with Python dictionaries.

Are Keys in dictionaries mutable?

As shown in the figure below, keys are immutable ( which cannot be changed ) data types that can be either strings or numbers. However, a key can not be a mutable data type, for example, a list. Keys are unique within a Dictionary and can not be duplicated inside a Dictionary.


2 Answers

Dicts are fast but can be heavy on the memory. Normally it shouldn't be a problem but you will only know when you test. I would advise to first test 1.000 lines, 10.000 lines and so on and have a look on the memory footprint.

If you run out of memory and your data structure allows it maybe try using named tuples.

EmployeeRecord = namedtuple('EmployeeRecord', 'name, age, title, department, paygrade')
import csv
for emp in map(EmployeeRecord._make, csv.reader(open("employees.csv", "rb"))):
    print(emp.name, emp.title)

(Example taken from the link)

If you have ascending integers you could also try to get more fancy by using the array module.

like image 103
CausticHarmony Avatar answered Sep 30 '22 18:09

CausticHarmony


Actually the string hashing is rather efficient in Python 3. I expected this to has the opposite outcome:

>>> timeit('d["1"];d["4"]', setup='d = {"1": 1, "4": 4}')
0.05167865302064456
>>> timeit('d[1];d[4]', setup='d = {1: 1, 4: 4}')
0.06110116100171581
like image 44
Klaus D. Avatar answered Sep 30 '22 18:09

Klaus D.