Hashing Similarity

Tags:

hash

Normally, the goal of hashing is to turn a continuous function into a discrete one: a small change in the input should cause a large change in the output. However, is there any hashing algorithm that will, (very) roughly speaking, return similar but (still different) hashes for similar inputs?

(An example of the use of this would be to check whether two files are "similar" by checking their hashes for similarity. Of course, some failure is always acceptable.)

214

asked Jan 29 '11 00:01

user541686

2 Answers

Look at Locality Sensitive Hashing (LSH). That is a probabilistic way of quickly finding a bunch of points near a given one, for example.

105

answered Oct 15 '22 16:10

Jeremiah Willcock

Given a distance function that tells you how similar or different are your objects, you can also employ distance permutations: http://www.computer.org/portal/web/csdl/doi/10.1109/TPAMI.2007.70815 or sketches: http://portal.acm.org/citation.cfm?id=1638180

For an implementation of the latter approach: http://obsearch.net

answered Oct 15 '22 16:10

Arnoldo Muller

Related questions
                            
                                How to use std::vector as the type of key for an std::unordered_map in C++?
                            
                                Why is the order of Python sets not deterministic even when PYTHONHASHSEED=0?
                            
                                How do I iterate through an array inside a Raku hash?
                            
                                What is the behavior of ruby Hash#merge when used with a block
                            
                                Zend_Auth setCredentialTreatment
                            
                                How can I manufacture pathological keys for a hash?
                            
                                Perfect Hash Functions
                            
                                Is there a quick and easy way to create a checksum from Ruby's basic data structures?
                            
                                Numeric hashing function in SQL Server?
                            
                                Ruby: remove empty braces in array of hashes
                            
                                How to generate a Hash or checksum value on Python Dataframe (created from a fixed width file)?
                            
                                Can I verify what algo php session hashing is using?
                            
                                MD5 and SHA1 C++ hashing library
                            
                                Secure Token URL - How secure is it? Proxy authentication as alternative?
                            
                                Is it possible to make a minimal perfect hash function in this situation?
                            
                                How to find out which algorithm crypt() uses on your machine?
                            
                                what does dividing by sizeof(void *) mean?
                            
                                Why are there multiple different hashing algorithm providers in System.Security.Cryptography?
                            
                                PHP & Hash / Fragment Portion of URL
                            
                                Should I cache the hash code of an STL string used as a hash key?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With