I recently recovered a ton pictures from a friend's dead hard drive and I decided to wanted to write a program in python to:
Go through all the files
Check their MD5Sum
Check to see if the MD5Sum exists in a text file
If it does, let me know with "DUPLICATE HAS BEEN FOUND"
If it doesn't, add the MD5Sum to the text file.
The ultimate goal being to delete all duplicates. However, when I run this code, I get the following:
Traceback (most recent call last):
File "C:\Users\godofgrunts\Documents\hasher.py", line 16, in <module>
for line in myfile:
io.UnsupportedOperation: not readable
Am I doing this completely wrong or am I just misunderstanding something?
import hashlib
import os
import re
rootDir = 'H:\\recovered'
hasher = hashlib.md5()
with open('md5sums.txt', 'w') as myfile:
for dirName, subdirList, fileList in os.walk(rootDir):
for fname in fileList:
with open((os.path.join(dirName, fname)), 'rb') as pic:
buf = pic.read()
hasher.update(buf)
md5 = str(hasher.hexdigest())
for line in myfile:
if re.search("\b{0}\b".format(md5),line):
print("DUPLICATE HAS BEEN FOUND")
else:
myfile.write(md5 +'\n')
You have opened your file in writing mode ('w'
) In your with
statement. To open it both writing and reading mode, do:
with open('md5sums.txt', 'w+') as myfile:
The correct mode is "r+", not "w+".
http://docs.python.org/3.3/tutorial/inputoutput.html#reading-and-writing-files
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With