Say, I'm going to construct a probably large dictionary in Python 3 for in-memory operations. The dictionary keys are integers, but I'm going to read them from a file as string at first.
As far as storage and retrieval are concerned, I wonder if it matters whether I store the dictionary keys as integers themselves, or as strings.
In other words, would leaving them as integers help with hashing?
Dictionary keys must be of an immutable type. Strings and numbers are the two most commonly used data types as dictionary keys. We can also use tuples as keys but they must contain only strings, integers, or other tuples.
A dictionary is 6.6 times faster than a list when we lookup in 100 items.
The keys are immutable. Just like lists, the values of dictionaries can hold heterogeneous data i.e., integers, floats, strings, NaN, Booleans, lists, arrays, and even nested dictionaries. This article will provide you a clear understanding and enable you to work proficiently with Python dictionaries.
As shown in the figure below, keys are immutable ( which cannot be changed ) data types that can be either strings or numbers. However, a key can not be a mutable data type, for example, a list. Keys are unique within a Dictionary and can not be duplicated inside a Dictionary.
Dicts are fast but can be heavy on the memory. Normally it shouldn't be a problem but you will only know when you test. I would advise to first test 1.000 lines, 10.000 lines and so on and have a look on the memory footprint.
If you run out of memory and your data structure allows it maybe try using named tuples.
EmployeeRecord = namedtuple('EmployeeRecord', 'name, age, title, department, paygrade')
import csv
for emp in map(EmployeeRecord._make, csv.reader(open("employees.csv", "rb"))):
print(emp.name, emp.title)
(Example taken from the link)
If you have ascending integers you could also try to get more fancy by using the array module.
Actually the string hashing is rather efficient in Python 3. I expected this to has the opposite outcome:
>>> timeit('d["1"];d["4"]', setup='d = {"1": 1, "4": 4}')
0.05167865302064456
>>> timeit('d[1];d[4]', setup='d = {1: 1, 4: 4}')
0.06110116100171581
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With