How can I calculate a hash for a filesystem-directory using Python?

Tags:

I'm using this code to calculate hash value for a file:

m = hashlib.md5()
with open("calculator.pdf", 'rb') as fh:
    while True:
        data = fh.read(8192)
        if not data:
            break
        m.update(data)
    hash_value = m.hexdigest()

    print  hash_value

when I tried it on a folder "folder"I got

IOError: [Errno 13] Permission denied: folder

How could I calculate the hash value for a folder ?

371

asked Jul 24 '14 15:07

2 Answers

Use checksumdir python package available for calculating checksum/hash of directory. It's available at https://pypi.python.org/pypi/checksumdir/1.0.5

Usage :

import checksumdir
hash = checksumdir.dirhash("c:\\temp")
print hash

answered Sep 21 '22 18:09

Here is an implementation that uses pathlib.Path instead of relying on os.walk. It sorts the directory contents before iterating so it should be repeatable on multiple platforms. It also updates the hash with the names of files/directories, so adding empty files and directories will change the hash.

Version with type annotations (Python 3.6 or above):

import hashlib
from _hashlib import HASH as Hash
from pathlib import Path
from typing import Union


def md5_update_from_file(filename: Union[str, Path], hash: Hash) -> Hash:
    assert Path(filename).is_file()
    with open(str(filename), "rb") as f:
        for chunk in iter(lambda: f.read(4096), b""):
            hash.update(chunk)
    return hash


def md5_file(filename: Union[str, Path]) -> str:
    return str(md5_update_from_file(filename, hashlib.md5()).hexdigest())


def md5_update_from_dir(directory: Union[str, Path], hash: Hash) -> Hash:
    assert Path(directory).is_dir()
    for path in sorted(Path(directory).iterdir(), key=lambda p: str(p).lower()):
        hash.update(path.name.encode())
        if path.is_file():
            hash = md5_update_from_file(path, hash)
        elif path.is_dir():
            hash = md5_update_from_dir(path, hash)
    return hash


def md5_dir(directory: Union[str, Path]) -> str:
    return str(md5_update_from_dir(directory, hashlib.md5()).hexdigest())

Without type annotations:

import hashlib
from pathlib import Path


def md5_update_from_file(filename, hash):
    assert Path(filename).is_file()
    with open(str(filename), "rb") as f:
        for chunk in iter(lambda: f.read(4096), b""):
            hash.update(chunk)
    return hash


def md5_file(filename):
    return md5_update_from_file(filename, hashlib.md5()).hexdigest()


def md5_update_from_dir(directory, hash):
    assert Path(directory).is_dir()
    for path in sorted(Path(directory).iterdir()):
        hash.update(path.name.encode())
        if path.is_file():
            hash = md5_update_from_file(path, hash)
        elif path.is_dir():
            hash = md5_update_from_dir(path, hash)
    return hash


def md5_dir(directory):
    return md5_update_from_dir(directory, hashlib.md5()).hexdigest()

Condensed version if you only need to hash directories:

def md5_update_from_dir(directory, hash):
    assert Path(directory).is_dir()
    for path in sorted(Path(directory).iterdir(), key=lambda p: str(p).lower()):
        hash.update(path.name.encode())
        if path.is_file():
            with open(path, "rb") as f:
                for chunk in iter(lambda: f.read(4096), b""):
                    hash.update(chunk)
        elif path.is_dir():
            hash = md5_update_from_dir(path, hash)
    return hash


def md5_dir(directory):
    return md5_update_from_dir(directory, hashlib.md5()).hexdigest()

Usage: md5_hash = md5_dir("/some/directory")

answered Sep 19 '22 18:09

danmou

Related questions
                            
                                how to convert a nested OrderedDict to dict?
                            
                                How to add more user identity session attributes in Yii2?
                            
                                Getting a substring from a string after a particular word
                            
                                Unable to pip install packages in Anaconda
                            
                                Is there a way to cast shared_ptr<void> to shared_ptr<T>?
                            
                                Proxy object in Python
                            
                                MIME Types for woff, ttf, svg, and eot 404ing despite being setup in IIS
                            
                                Fastest way to write huge data in file
                            
                                Using verbose in Laravel artisan commands
                            
                                Get tweet url having only tweet id
                            
                                Check if paramiko ssh connection is still alive
                            
                                How can I get the primary color from my app theme?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I calculate a hash for a filesystem-directory using Python?

Tags:

user3832061

People also ask

2 Answers

Mangu Singh Rajpurohit

danmou

Recent Activity

Donate For Us