Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate a good hash code for a huge list of strings?

What is the best way to calculate hash code based on values of these string in one pass?

With good I mean that it needs to be:

1 - fast: I need to get hash code for huge list (10^3..10^8 items) of short strings.

2 - identify the whole list of data so many list with maybe only couple of different strings must have different hash codes

How to do it in Java?

Maybe there is a way to use existing string hash code, but how to merge many hash codes calculated for separate strings?

Thank you.

like image 776
Bohdan Avatar asked Feb 01 '13 01:02

Bohdan


Video Answer


1 Answers

create a placeholder class for you strings and then use CRC32 class. its simple and fast:

import java.util.zip.CRC32;

public class HugeStringCollection {
    private Collection<String> strings;

    public HugeStringCollection(Collection<String> strings) {
        this.strings = strings;
    }

    public int hashCode() {
        CRC32 crc = new CRC32();
        for(String string : strings) {
            crc.update(string.getBytes())
        }

        return (int)( crc.getValue() );
    }
}

if the collection itself is immutable, you can compute the hash once and store it for lates reuse.

like image 128
mantrid Avatar answered Sep 20 '22 02:09

mantrid