I have a long list of integers that I want to turn into an MD5 hash. What's the quickest way to do this? I have tried two options, both similar. Just wondering if I'm missing an obviously quicker method.
import random
import hashlib
import cPickle as pickle
r = [random.randrange(1, 1000) for _ in range(0, 1000000)]
def method1(r):
p = pickle.dumps(r, -1)
return hashlib.md5(p).hexdigest()
def method2(r):
p = str(r)
return hashlib.md5(p).hexdigest()
def method3(r):
p = ','.join(map(str, r))
return hashlib.md5(p).hexdigest()
Then time it in iPython:
timeit method1(r)
timeit method2(r)
timeit method3(r)
Gives me this:
In [8]: timeit method1(r)
10 loops, best of 3: 68.7 ms per loop
In [9]: timeit method2(r)
10 loops, best of 3: 176 ms per loop
In [10]: timeit method3(r)
1 loops, best of 3: 270 ms per loop
So, option 1 is the best I've got. But I have to do it a lot and it's currently the rate determining step in my code.
Any tips or tricks to get a unique hash from a big list quicker than what's here, using Python 2.7?
You may find this useful. It uses my own custom bench-marking framework (based on timeit
) to gather and print the results. Since the variations in speed are primarily due to the need to convert the r
list to something that hashlib.md5()
can work with, I've updated the suite of test cases to show how storing the values in an array.array
instead, as @DSM suggested in a comment, would dramatically speed things up. Note that since the integers in the list are all relatively small I've stored them in an array of short (2-byte) values.
from __future__ import print_function
import sys
import timeit
setup = """
import array
import random
import hashlib
import marshal
import cPickle as pickle
import struct
r = [random.randrange(1, 1000) for _ in range(0, 1000000)]
ra = array.array('h', r) # create an array of shorts equivalent
def method1(r):
p = pickle.dumps(r, -1)
return hashlib.md5(p).hexdigest()
def method2(r):
p = str(r)
return hashlib.md5(p).hexdigest()
def method3(r):
p = ','.join(map(str, r))
return hashlib.md5(p).hexdigest()
def method4(r):
fmt = '%dh' % len(r)
buf = struct.pack(fmt, *r)
return hashlib.md5(buf).hexdigest()
def method5(r):
a = array.array('h', r)
return hashlib.md5(a).hexdigest()
def method6(r):
m = marshal.dumps(r)
return hashlib.md5(m).hexdigest()
# using pre-built array...
def pb_method1(ra):
p = pickle.dumps(ra, -1)
return hashlib.md5(p).hexdigest()
def pb_method2(ra):
p = str(ra)
return hashlib.md5(p).hexdigest()
def pb_method3(ra):
p = ','.join(map(str, ra))
return hashlib.md5(p).hexdigest()
def pb_method4(ra):
fmt = '%dh' % len(ra)
buf = struct.pack(fmt, *ra)
return hashlib.md5(buf).hexdigest()
def pb_method5(ra):
return hashlib.md5(ra).hexdigest()
def pb_method6(ra):
m = marshal.dumps(ra)
return hashlib.md5(m).hexdigest()
"""
statements = {
"pickle.dumps(r, -1)": """
method1(r)
""",
"str(r)": """
method2(r)
""",
"','.join(map(str, r))": """
method3(r)
""",
"struct.pack(fmt, *r)": """
method4(r)
""",
"array.array('h', r)": """
method5(r)
""",
"marshal.dumps(r)": """
method6(r)
""",
# versions using pre-built array...
"pickle.dumps(ra, -1)": """
pb_method1(ra)
""",
"str(ra)": """
pb_method2(ra)
""",
"','.join(map(str, ra))": """
pb_method3(ra)
""",
"struct.pack(fmt, *ra)": """
pb_method4(ra)
""",
"ra (pre-built)": """
pb_method5(ra)
""",
"marshal.dumps(ra)": """
pb_method6(ra)
""",
}
N = 10
R = 3
timings = [(
idea,
min(timeit.repeat(statements[idea], setup=setup, repeat=R, number=N)),
) for idea in statements]
longest = max(len(t[0]) for t in timings) # length of longest name
print('fastest to slowest timings (Python {}.{}.{})\n'.format(*sys.version_info[:3]),
' ({:,d} calls, best of {:d})\n'.format(N, R))
ranked = sorted(timings, key=lambda t: t[1]) # sort by speed (fastest first)
for timing in ranked:
print("{:>{width}} : {:.6f} secs, rel speed {rel:>8.6f}x".format(
timing[0], timing[1], rel=timing[1]/ranked[0][1], width=longest))
Results:
fastest to slowest timings (Python 2.7.6)
(10 calls, best of 3)
ra (pre-built) : 0.037906 secs, rel speed 1.000000x
marshal.dumps(ra) : 0.177953 secs, rel speed 4.694626x
marshal.dumps(r) : 0.695606 secs, rel speed 18.350932x
pickle.dumps(r, -1) : 1.266096 secs, rel speed 33.401179x
array.array('h', r) : 1.287884 secs, rel speed 33.975950x
pickle.dumps(ra, -1) : 1.955048 secs, rel speed 51.576558x
struct.pack(fmt, *r) : 2.085602 secs, rel speed 55.020743x
struct.pack(fmt, *ra) : 2.357887 secs, rel speed 62.203962x
str(r) : 2.918623 secs, rel speed 76.996860x
str(ra) : 3.686666 secs, rel speed 97.258777x
','.join(map(str, r)) : 4.701531 secs, rel speed 124.032173x
','.join(map(str, ra)) : 4.968734 secs, rel speed 131.081303x
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With