Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Complexity of the internal hash() function in Python

Tags:

python

big-o

hash

I understand that the average case of hash-table lookup is O(1), but does this include the time it takes to compute the hash value itself of the given input? I've tried looking for the answer on google, read all the docs needed but could not find the implementation of the internal hash() function in Python. Some websites state that computing the hash value takes a constant amount of time, some say it is O(k) where k is the length of the input. I would be happy if you could help me find the correct answer. Thanks in advance :)

like image 356
Noy Avatar asked Jul 14 '18 11:07

Noy


People also ask

What is hash () function in Python?

Python hash() function is a built-in function and returns the hash value of an object if it has one. The hash value is an integer which is used to quickly compare dictionary keys while looking at a dictionary.

What is the time complexity of inbuilt function in Python?

The first has a time complexity of O(N) for Python2, O(1) for Python3 and the latter has O(1) which can create a lot of differences in nested statements.

What is the big O of a hash function?

Big O is used to describe how the run time or memory consumption of a process grows as its input grows. If your array is of constant length, then it will not grow and have an effect.

What is the time complexity of hashing a string?

One of the most common applications of hashing strings is to compare it. Comparing strings of length N takes O(N) time complexity but comparing integers take O(1) time complexity. Hence, comparing hash of strings take O(1) time complexity.


2 Answers

It depends entirely on the type being hashed. Here are some simple tests in CPython 2.7.13, which is not the only option:

>>> pprint.pprint([(i, timeit.timeit('hash(n)', setup='n="a"*{}'.format(6400*i), number=1)) for i in range(16)])
[(0, 1.9073486328125e-06),
 (1, 1.6927719116210938e-05),
 (2, 3.314018249511719e-05),
 (3, 4.887580871582031e-05),
 (4, 6.4849853515625e-05),
 (5, 8.106231689453125e-05),
 (6, 9.679794311523438e-05),
 (7, 0.00011301040649414062),
 (8, 0.00012993812561035156),
 (9, 0.00014495849609375),
 (10, 0.00016188621520996094),
 (11, 0.0001780986785888672),
 (12, 0.00019288063049316406),
 (13, 0.0002090930938720703),
 (14, 0.000225067138671875),
 (15, 0.00024199485778808594)]
>>> pprint.pprint([(i, timeit.timeit('hash(n)', setup='n="a"*{}'.format(6400*i))) for i in range(16)])
[(0, 0.09920382499694824),
 (1, 0.09032988548278809),
 (2, 0.09069585800170898),
 (3, 0.09006309509277344),
 (4, 0.09059309959411621),
 (5, 0.09033513069152832),
 (6, 0.09037399291992188),
 (7, 0.09031510353088379),
 (8, 0.09110498428344727),
 (9, 0.09074902534484863),
 (10, 0.0909719467163086),
 (11, 0.09081602096557617),
 (12, 0.09112405776977539),
 (13, 0.09091711044311523),
 (14, 0.09103798866271973),
 (15, 0.09085893630981445)]

Note how hashing a freshly created string is O(n), but every string is caching its hash so it amortises to constant time when repeated (number=1000000 is the default for timeit).

>>> pprint.pprint([(i, timeit.timeit('hash(n)', setup='n=2**{}'.format(64*i))) for i in range(16)])
[(0, 0.09280180931091309),
 (1, 0.09100484848022461),
 (2, 0.09413909912109375),
 (3, 0.09609699249267578),
 (4, 0.10647201538085938),
 (5, 0.1146399974822998),
 (6, 0.12569880485534668),
 (7, 0.1291029453277588),
 (8, 0.13350296020507812),
 (9, 0.1369338035583496),
 (10, 0.14037799835205078),
 (11, 0.14420413970947266),
 (12, 0.1485278606414795),
 (13, 0.15162205696105957),
 (14, 0.15520405769348145),
 (15, 0.15993809700012207)]

long is also O(n), where n is the width of the number, thus logarithmic of magnitude. The granularity is that of digit, typically 2**30 specifically to be usable directly as a hash for smaller ints.

Other objects will have their own behaviour, for instance tuples will roughly sum the hash time of their contents.

like image 116
Yann Vernier Avatar answered Oct 21 '22 03:10

Yann Vernier


A small test I made to test the hypothesis. The results don't seem to depend on the length of the length of the input.

import datetime

x = ['a','aa','aaaaaaaaaaaaaaaaaaaaaaaaaaaa','aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa','aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa']
for i in range(len(x)):

    for j in range(len(x)):
        print "Checking for: " + x[i] + " " + x[j]
        a = datetime.datetime.now()

        h = hash((x[i],x[j])) 
        b = datetime.datetime.now()
        c = b - a
        print "Time taken : " + str(c.microseconds) 

Results

Checking for: a a
Time taken : 87
Checking for: a aa
Time taken : 10
Checking for: a aaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: a aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: a aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aa a
Time taken : 9
Checking for: aa aa
Time taken : 8
Checking for: aa aaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaa a
Time taken : 10
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaa aa
Time taken : 8
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 8
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa a
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aa
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 11
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa a
Time taken : 10
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aa
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 8
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
like image 2
Vizag Avatar answered Oct 21 '22 03:10

Vizag