Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

My hash table is slower than binary search

I've implemented binary search, linear search and a hash table to compare each time complexity. The problem is that somehow, my hash table is much slower than the binary search when I measure time to find prime numbers. Below is my code:

// Make the hash table 20 times the number of prime numbers
HashTable::HashTable(std::vector<int> primes)
{
    int tablesize = primes.size() * 20;
    table = new std::list<int>[tablesize];
    size = tablesize;
    for (auto &prime : primes)
        this->insert(prime);
}

// Hash function
int HashTable::hash(int key)
{
    return key % size;
}

// Finds element
int HashTable::find(int key)
{
    // Get index from hash
    int index = hash(key);

    // Find element
    std::list<int>::iterator foundelement = std::find(table[index].begin(), table[index].end(), key);


    // If element has been found return index
    // If not, return -1
    if (foundelement != table[index].end())
        return index;
    else
        return -1;
}



// Adds element to hashtable
void HashTable::insert(int element)
{
    // Get index from hash and insert the element
    int index = hash(element);
    table[index].push_back(element);
}

HashTable.h

#ifndef HASHTABLE_H
#define HASHTABLE_H

#include <list>
#include <iostream>
#include <vector>

class HashTable 
{
private:
    // Each position in Hashtable has an array of lists to store elements in case of collision
    std::list<int>* table;

    // Size of hashtable
    int size;

    // Hashfunction that returns the array location for the given key
    int hash(int key);

public:

    HashTable(int tablesize);
    HashTable(std::vector<int> primes);

    // Adds element to hashtable
    void insert(int element);

    // Deletes an element by key 
    void remove(int key);

    // Returns an element from hashtable for a given key
    int find(int key);

    // Displays the hashtable
    void printTable();

    // Display histogram to illustrate elements distribution
    void printHistogram();

    // Returns the number of lists in hash table
    int getSize();

    // Returns the total number of elements in hash table
    int getNumberOfItems();

    // De-allocates all memory used for the Hash Table.
    ~HashTable();
};

#endif

I've already tried to exceed the size of the table size to eliminate collisions but I didn't notice any difference.

This is the result

like image 249
user5310508 Avatar asked Dec 06 '15 21:12

user5310508


2 Answers

Some things that are suboptimal with the hash table implementation:

  • primes.size() * 20 is excessive - you'll get a lot more cache misses than necessary; try a range of values between 1 and ~2 to find an optimal point

  • primes.size() * 20 is always even, and all the prime numbers you hash with key % size are odd, so you never hash into half the buckets, wasting space and degrading cache performance

  • you handle collisions with linked lists: that means you're always following at least one pointer away from the table's contiguous memory, which is slow, and for collisions you jump around in memory with each node in the list; using std::vector<int> to store colliding values would cap the jumping at 1 memory area outside the hash table, or you could use closed hashing / open addressed and displacement lists to typically find the element in a nearby hash table bucket: my benchmarks have found that around an order of magnitude faster for similar int values.

like image 69
Tony Delroy Avatar answered Sep 30 '22 14:09

Tony Delroy


If your data is completely random it may be difficult to find a good constant for the modulo operation. If your data follows some sort of pattern you may want to try running through a bunch of candidate constants to see which one performs best on your data.

In this post I showed how such a large-scale test could be structured. In the end my hash table produced an average lookup in 1.5 comparisons with a worst case of 14. The table contained 16000 entries, roughly 2^14.

like image 30
Olof Forshell Avatar answered Sep 30 '22 12:09

Olof Forshell