Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++: Suggestions about a hash function for a sequence of strings where the order of the strings is irrelevant

Let's say you have these two sequences of strings

abc cba bc

bc abc cba

I'm trying to create a mapping for such sequences(the sequence is also a string) so that the above two sequences are mapped into the same bucket.

My initial thought would be to add the results of a hashing function that is applied to each string separately. In this way their order won't matter. If I applied the hashing function to the sequence string as a whole, then of course the hash result would be different.

However I'm very new to the world of string hashing functions and I have no idea whether this approach would be efficient.

In this website http://www.partow.net/programming/hashfunctions/index.html

I found many different implementations for string hashing, however I'm not sure which one would be the "best" for my needs.

Some technical details about each string in the sequence is that each of them won't have more than 25 characters. Also each sequence won't have more than 3 strings.

Questions

1. Would this approach of adding the results of a string hashing function to each string of the sequence work?

2. If yes which string hashing function should I use that would give a low amount of collisions and also be time efficient?

Thank you in advance

like image 428
ksm001 Avatar asked Apr 01 '13 10:04

ksm001


1 Answers

Just the idea demonstration (very inefficient string copying), complexity O(NlogN) where N is the size of the key (=== O(1) if your keys have constant length known at compile time), I don't think you can do better complexity:

#include <boost/functional/hash.hpp>
#include <set>
#include <algorithm>

std::size_t make_hash(
  std::string const& a,
  std::string const& b,
  std::string const& c)
{
    std::string input[] = {a,b,c};
    std::sort(input, input + (sizeof(input)/sizeof(*input)));
    return boost::hash_range(input, input + (sizeof(input)/sizeof(*input)));
}

#include <iostream>
// g++ -I.../boost_1_47_0 string_set_hash.cpp
int main()
{
    std::cout << make_hash("abc", "bcd", "def") << std::endl; // 46247451276990640
    std::cout << make_hash("bcd", "def", "abc") << std::endl; // 46247451276990640
}

A fragment of boost/functional/hash.hpp for reference:

template <class T>
inline void hash_combine(std::size_t& seed, T const& v)

{
    boost::hash<T> hasher;
    seed ^= hasher(v) + 0x9e3779b9 + (seed<<6) + (seed>>2);
}

template <class It>
inline std::size_t hash_range(It first, It last)
{
    std::size_t seed = 0;

    for(; first != last; ++first)
    {
        hash_combine(seed, *first);
    }

    return seed;
}
like image 178
bobah Avatar answered Sep 25 '22 00:09

bobah