Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python crashing while calculating SHA-1 hashs for large files in Windows OS

I am wondering if I could have some fresh eyes on this python script. It works fine with small and medium size files but with large ones (4-8GB or so) it inexplicable crashes after running for a couple of minutes.

Zipped script here

Or:

import sys
import msvcrt
import hashlib

#Print the file name (and its location) to be hashed  
print 'File:  ' + str(sys.argv[1])

#Set "SHA1Hash" equal to SHA-1 hash
SHA1Hash = hashlib.sha1()

#Open file specified by "sys.argv[1]" in read only (r) and binary (b) mode
File = open(sys.argv[1], 'rb')

#Get the SHA-1 hash for the contents of the specified file
SHA1Hash.update(File.read())

#Close the file
File.close()

#Set "SHA1HashBase16" equal to the hexadecimal of "SHA1Hash"
SHA1HashBase16 = SHA1Hash.hexdigest()

#Print the SHA-1 (hexadecimal) hash of the file
print 'SHA-1: ' + SHA1HashBase16

#Make a blank line
print ' '

#Print "Press any key to continue..."
print 'Press any key to continue...'

#"Press any key to continue..." delay
char=0
while not char:
    char=msvcrt.getch()

* Updated *

Working python script for calculating SHA-1 hash of large files. Thanks goes to Ignacio Vazquez-Abrams for pointing out what was wrong and Tom Zych for the code.

Zipped source here

To use simply drag and drop the file to be hashed on top of script. Alternatively you can either use a command prompt with the usage of:

SHA-1HashGen.py Path&File 

Were SHA-1HashGen.py is the file name of the script and Path&File is the path and file name of the file to be hashed.

Or drop the script in to the SendTo folder (in Windows OS; shell:sendto) to get it as a right click option.

like image 409
Peter Avatar asked Apr 02 '11 01:04

Peter


2 Answers

Stop reading the file in one go; you're consuming all the memory on the system. Read in 16MB or so chunks instead.

data = File.read(16 * 1024 * 1024)
like image 129
Ignacio Vazquez-Abrams Avatar answered Nov 14 '22 22:11

Ignacio Vazquez-Abrams


(In response to Peter's comment that 2 GB are left.)

I suspect Ignacio is right nonetheless. Try replacing the read/update line with this:

while True:
    buf = File.read(0x100000)
    if not buf:
        break
    SHA1Hash.update(buf)
like image 24
Tom Zych Avatar answered Nov 14 '22 22:11

Tom Zych