Fast low-collision non-crypto hash in JavaScript for Files

Tags:

I am looking for a fast hash with low collisions implemented in JavaScript. It doesn't need to be a crypto hash. I am basically using it as a way to see if a given file has already been uploaded (or partially uploaded) to a user's account to save them some upload time on large (video) files.

I am using the new HTML5 File API to read in slices of the file. I then hand this off to SparkMD5 to give me a hash of the file. I like the fact that SparkMD5 allows me to do an incremental hash so I don't have to read the entire thing in memory.

Overall, SparkMD5 works for my needs but for large files it can take a while to get me my hash (about 30 seconds for a 300MB file). I would ideally like to reduce this. I am not that knowledgeable on hash functions so I am not looking to port something and am ideally looking for an already implemented library.

394

asked Apr 25 '13 22:04

Eric Anderson

1 Answers

Update (August 2021): My benchmark predates WebAssembly and is therefore out-of-date. There likely exist hash functions compiled into WebAssembly that beat the pure JS implementations. If somebody wants to update this benchmark, a pull request to my benchmark repo would be most welcome!

I benchmarked various hashing algorithms, and here's the fastest options I found:

If you only need 32 bit digests, use iMurmurHash. Note that this will give you collisions after about 2**14 (16,000) hashes.
Use SparkMD5 if you need more than 32 bits. I didn't find a fast 64 or 128 bit Murmur implementation, but SparkMD5 was surprisingly fast (75 MB/sec).
- If you need incremental updates, consider joining strings into larger chunks before feeding them to SparkMD5, as SparkMD5's incremental hashing seems to suffer from some moderate overhead.

These recommendations are for pure JavaScript. I benchmarked them with strings, but SparkMD5 takes ArrayBuffers as well.

If you want fast hashing modules on Node, your best options are slightly different:

If you are hashing Buffers: Use the built-in crypto module with the MD5 algorithm.
- The exception to this is: If you don't need incremental hashing, and you need more than 500 MB/sec throughput, and you're OK with a native npm dependency, use murmurhash-native for some extra performance. I didn't test digest sizes of less than 128 bit, as even with 128 bits the hashing is so fast that it's not likely to be a bottleneck.
  
  (Note that murmurhash-native technically supports incremental hashing, but it's slower than Node's built-in MD5 algorithm in this mode.)
If you are hashing a single string non-incrementally, convert it to a Buffer and see the preceding bullet point.
If you are hashing strings incrementally:
- If you only need 32 bits, use iMurmurHash. Note that this will give you collisions after about 2**14 (16,000) hashes.
- If you need more than 32 bits: Use the built-in crypto module with the MD5 algorithm.
  - I also recommend that you experiment with joining strings together into larger chunks, since strings are implicitly converted to Buffers when you pass them to the crypto module, and each Buffer creation is quite slow. Performance will generally be bottlenecked by Buffer creation and string joining, as the native hashing function is very fast in comparison.

191

answered Sep 28 '22 01:09

Jo Liss

Related questions
                            
                                How to turn off iPhone keypad for a specified input field on web page
                            
                                Where can I get info on the object parameter syntax for JavaScript functions?
                            
                                Node.js/ v8: How to make my own snapshot to accelerate startup
                            
                                Sending a javascript array to code behind(c#) using ajax
                            
                                Distorted InfoBox in Google Maps API
                            
                                numeric array sort()
                            
                                Why is there separate mongo.Server and mongo.Db in mongodb-native driver?
                            
                                iframe not rendering in ie9 mode when containing page is in quirks mode
                            
                                Performance penalty for undefined arguments
                            
                                Are the terms "global property" and "global variable" synonyms?
                            
                                NodeJs, Mocha and Mongoose
                            
                                javascript OCR API [closed]
                            
                                How are objects in JavaScript stored in memory?
                            
                                Why do built-in functions not have a prototype property?
                            
                                How to run a callback when Mustache.js has finished rendering template
                            
                                Why is postMessage script not working in IE8?
                            
                                what is the expected performance of ZeroMQ?
                            
                                Javascript can you serialize a File object
                            
                                Using moment.js with lang file and require.js
                            
                                What is the cost to convert a javascript variable to jQuery Object?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Fast low-collision non-crypto hash in JavaScript for Files

Tags:

javascript

html

hash

fileapi

Eric Anderson

People also ask

1 Answers

Jo Liss

Recent Activity

Donate For Us