I'm using this code to calculate hash value for a file:
m = hashlib.md5()
with open("calculator.pdf", 'rb') as fh:
while True:
data = fh.read(8192)
if not data:
break
m.update(data)
hash_value = m.hexdigest()
print hash_value
when I tried it on a folder "folder"I got
IOError: [Errno 13] Permission denied: folder
How could I calculate the hash value for a folder ?
Source Code to Find HashHash functions are available in the hashlib module. We loop till the end of the file using a while loop. On reaching the end, we get empty bytes object. In each iteration, we only read 1024 bytes (this value can be changed according to our wish) from the file and update the hashing function.
Hashing involves applying a hashing algorithm to a data item, known as the hashing key, to create a hash value. Hashing algorithms take a large range of values (such as all possible strings or all possible files) and map them onto a smaller set of values (such as a 128 bit number). Hashing has two main applications.
Use checksumdir python package available for calculating checksum/hash of directory. It's available at https://pypi.python.org/pypi/checksumdir/1.0.5
Usage :
import checksumdir
hash = checksumdir.dirhash("c:\\temp")
print hash
Here is an implementation that uses pathlib.Path instead of relying on os.walk. It sorts the directory contents before iterating so it should be repeatable on multiple platforms. It also updates the hash with the names of files/directories, so adding empty files and directories will change the hash.
Version with type annotations (Python 3.6 or above):
import hashlib
from _hashlib import HASH as Hash
from pathlib import Path
from typing import Union
def md5_update_from_file(filename: Union[str, Path], hash: Hash) -> Hash:
assert Path(filename).is_file()
with open(str(filename), "rb") as f:
for chunk in iter(lambda: f.read(4096), b""):
hash.update(chunk)
return hash
def md5_file(filename: Union[str, Path]) -> str:
return str(md5_update_from_file(filename, hashlib.md5()).hexdigest())
def md5_update_from_dir(directory: Union[str, Path], hash: Hash) -> Hash:
assert Path(directory).is_dir()
for path in sorted(Path(directory).iterdir(), key=lambda p: str(p).lower()):
hash.update(path.name.encode())
if path.is_file():
hash = md5_update_from_file(path, hash)
elif path.is_dir():
hash = md5_update_from_dir(path, hash)
return hash
def md5_dir(directory: Union[str, Path]) -> str:
return str(md5_update_from_dir(directory, hashlib.md5()).hexdigest())
Without type annotations:
import hashlib
from pathlib import Path
def md5_update_from_file(filename, hash):
assert Path(filename).is_file()
with open(str(filename), "rb") as f:
for chunk in iter(lambda: f.read(4096), b""):
hash.update(chunk)
return hash
def md5_file(filename):
return md5_update_from_file(filename, hashlib.md5()).hexdigest()
def md5_update_from_dir(directory, hash):
assert Path(directory).is_dir()
for path in sorted(Path(directory).iterdir()):
hash.update(path.name.encode())
if path.is_file():
hash = md5_update_from_file(path, hash)
elif path.is_dir():
hash = md5_update_from_dir(path, hash)
return hash
def md5_dir(directory):
return md5_update_from_dir(directory, hashlib.md5()).hexdigest()
Condensed version if you only need to hash directories:
def md5_update_from_dir(directory, hash):
assert Path(directory).is_dir()
for path in sorted(Path(directory).iterdir(), key=lambda p: str(p).lower()):
hash.update(path.name.encode())
if path.is_file():
with open(path, "rb") as f:
for chunk in iter(lambda: f.read(4096), b""):
hash.update(chunk)
elif path.is_dir():
hash = md5_update_from_dir(path, hash)
return hash
def md5_dir(directory):
return md5_update_from_dir(directory, hashlib.md5()).hexdigest()
Usage: md5_hash = md5_dir("/some/directory")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With