I'm looking for a hash function that: <ol> <li>Hashes textual strings well (e.g. few collisions)</li> <li>Is written in Java, and widely used</li> <li>Bonus: works on several fields (instead of me concatenating them and applying the hash on the concatenated string)</li> <li>Bonus: Has a 128-bit variant.</li> <li>Bonus: Not CPU intensive.</li> </ol>

An answer for today (2018). SipHash. It will be much faster than most of the answers here, and significantly higher quality than all of them. The Guava library has one: https://google.github.io/guava/releases/23.0/api/docs/com/google/common/hash/Hashing.html#sipHash24--

What is a good 64bit hash function in Java for textual strings?

2 Answers

Why don't you use a long variant of the default String.hashCode() (where some really smart guys certainly put effort into making it efficient - not mentioning the thousands of developer eyes that already looked at this code)?

// adapted from String.hashCode() public static long hash(String string) {   long h = 1125899906842597L; // prime   int len = string.length();    for (int i = 0; i < len; i++) {     h = 31*h + string.charAt(i);   }   return h; }

~~If you're looking for even more bits, you could probably use a BigInteger~~ Edit:

As I mentioned in a comment to the answer of @brianegge, there are not much usecases for hashes with more than 32 bits and most likely not a single one for hashes with more than 64 bits:

I could imagine a huge hashtable distributed across dozens of servers, maybe storing tens of billions of mappings. For such a scenario, @brianegge still has a valid point here: 32 bit allow for 2^32 (ca. 4.3 billion) different hash keys. Assuming a strong algorithm, you should still have quite few collisions. With 64 bit (18,446,744,073 billion different keys) your certainly save, regardless of whatever crazy scenario you need it for. Thinking of usecases for 128 bit keys (340,282,366,920,938,463,463,374,607,431 billion possible keys) is pretty much impossible though.

To combine the hash for several fields, simply ~~do an XOR~~ multiply one with a prime and add them:

long hash = MyHash.hash(string1) * 31 + MyHash.hash(string2);

The small prime is in there to avoid equal hash code for switched values, i.e. {'foo','bar'} and {'bar','foo'} aren't equal and should have a different hash code. XOR is bad as it returns 0 if both values are equal. Therefore, {'foo','foo'} and {'bar','bar'} would have the same hash code.

answered Sep 23 '22 02:09

sfussenegger

An answer for today (2018). SipHash.

It will be much faster than most of the answers here, and significantly higher quality than all of them.

The Guava library has one: https://google.github.io/guava/releases/23.0/api/docs/com/google/common/hash/Hashing.html#sipHash24--

answered Sep 21 '22 02:09

Scott Carey

Related questions
                            
                                What are the possible AOP use cases?
                            
                                Are the bit patterns of NaNs really hardware-dependent?
                            
                                How do I use Hamcrest with JUnit 5 when JUnit 5 doesn't have an assertThat() function?
                            
                                How to determine the class of a generic type?
                            
                                Accessing non-visible classes with reflection
                            
                                A better approach to handling exceptions in a functional way
                            
                                Are LinkedBlockingQueue's insert and remove methods thread safe?
                            
                                How to stop hack/DOS attack on web API
                            
                                SimpleDateFormat parse loses timezone [duplicate]
                            
                                Integers caching in Java [duplicate]
                            
                                How does "final int i" work inside of a Java for loop?
                            
                                How to test if a double is zero?
                            
                                Spring Boot 2.5.0 generates plain.jar file. Can I remove it?
                            
                                How do you change the CLASSPATH within Java?
                            
                                How do you Change a Package's Log Level using Log4j?
                            
                                Is it possible to build a JPA entity by extending a POJO?
                            
                                Spring 3.0 MVC binding Enums Case Sensitive
                            
                                How can a class have a member of its own type, isn't this infinite recursion?
                            
                                What properties does @Column columnDefinition make redundant?
                            
                                Is there a performance difference between Javac debug on and off?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is a good 64bit hash function in Java for textual strings?

Tags:

java

string

hash

64-bit

collision

ripper234

People also ask

2 Answers

sfussenegger

Scott Carey

Recent Activity

Donate For Us