Multithreaded MD5 Checksum in Python

Question

I have a python script that recursively walks a specified directory, and checksums each file it finds. It then writes a log file which lists all file paths and their md5 checksums.

Sequentially, this takes a long time for 50,000 files at 15 MB each. However, my computer has much more resources available than it's actually using. How can I adjust my approach so that the script uses more resources to execute faster?

For example, could I split my file list into thirds and run a thread for each, giving me a 3x runtime?

I'm not very comfortable with threading, and I hope someone wouldn't mind whipping up and example for my case.

Here's the code for my sequential md5 loop:

for (root, dirs, files) in os.walk(root_path):
    for filename in files:
        file_path = root + "/" + filename
        md5_pairs.append([file_path, md5file(file_path, 128)])

Thanks for your help in advance!

jsbueno · Accepted Answer

For this kind of work, I think multiprocessing.Pool would give you less surprises - check the examples and docs at http://docs.python.org/library/multiprocessing.html

Multithreaded MD5 Checksum in Python

Tags:

python

multithreading

md5

checksum

Jamie

1 Answers

jsbueno

Recent Activity

Donate For Us

Multithreaded MD5 Checksum in Python

Tags:

python

multithreading

md5

checksum

Jamie

1 Answers

jsbueno

Related questions

Recent Activity

Donate For Us