Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java resumable hash computation

I would like to achieve resumable on-the-fly hash generation of some file being uploaded on the server. The files are big so I am using the update(byte[]) method of MessageDigest class (as described here, for instance: How can I generate an MD5 hash? ) on the fly, as new bytes arrive from the HttpServletRequest's InputStream.

Everything is going well, however, it's becoming interesting at the moment when I want to add resumable upload support. If upload is prematurely terminated, the incomplete file is stored on the disk. However, the controller (and underlying service) exits, so the MessageDigest object is lost. Before that happens, can I serialize the MessageDigest object to the disk (or DB, it doesn't matter) in the way that when I deserialize the object again, it will remember its temporary state, so when I resume uploading (from the exact place where it has been terminated before, so no bytes are redundant, nor are some bytes missing) and continue update()ing that deserialized MessageDigest, ultimately I get the same result (hash) as if the file was uploaded whole at once?

like image 556
Michal Boska Avatar asked Aug 01 '12 10:08

Michal Boska


2 Answers

Grab one of the custom MD5 implementations like this one or this one. Make it serializable or just make its internal state public. Preserve the state when the upload is aborted, and restore it when the upload is resumed.

like image 158
Roman Starkov Avatar answered Oct 16 '22 00:10

Roman Starkov


Hashes are cheap to compute (MD5 doubly so; are you sure you don't want SHA1?). I would recommend rehashing everything from the beginning as soon as you detect that an upload has been resumed. Runtime should be low unless the uploads are truly huge - hopefully large, interrupted uploads will be scarce.

like image 43
tucuxi Avatar answered Oct 16 '22 00:10

tucuxi