I've implemented binary search, linear search and a hash table to compare each time complexity. The problem is that somehow, my hash table is much slower than the binary search when I measure time to find prime numbers. Below is my code:
// Make the hash table 20 times the number of prime numbers
HashTable::HashTable(std::vector<int> primes)
{
int tablesize = primes.size() * 20;
table = new std::list<int>[tablesize];
size = tablesize;
for (auto &prime : primes)
this->insert(prime);
}
// Hash function
int HashTable::hash(int key)
{
return key % size;
}
// Finds element
int HashTable::find(int key)
{
// Get index from hash
int index = hash(key);
// Find element
std::list<int>::iterator foundelement = std::find(table[index].begin(), table[index].end(), key);
// If element has been found return index
// If not, return -1
if (foundelement != table[index].end())
return index;
else
return -1;
}
// Adds element to hashtable
void HashTable::insert(int element)
{
// Get index from hash and insert the element
int index = hash(element);
table[index].push_back(element);
}
HashTable.h
#ifndef HASHTABLE_H
#define HASHTABLE_H
#include <list>
#include <iostream>
#include <vector>
class HashTable
{
private:
// Each position in Hashtable has an array of lists to store elements in case of collision
std::list<int>* table;
// Size of hashtable
int size;
// Hashfunction that returns the array location for the given key
int hash(int key);
public:
HashTable(int tablesize);
HashTable(std::vector<int> primes);
// Adds element to hashtable
void insert(int element);
// Deletes an element by key
void remove(int key);
// Returns an element from hashtable for a given key
int find(int key);
// Displays the hashtable
void printTable();
// Display histogram to illustrate elements distribution
void printHistogram();
// Returns the number of lists in hash table
int getSize();
// Returns the total number of elements in hash table
int getNumberOfItems();
// De-allocates all memory used for the Hash Table.
~HashTable();
};
#endif
I've already tried to exceed the size of the table size to eliminate collisions but I didn't notice any difference.
Some things that are suboptimal with the hash table implementation:
primes.size() * 20
is excessive - you'll get a lot more cache misses than necessary; try a range of values between 1 and ~2 to find an optimal point
primes.size() * 20
is always even, and all the prime numbers you hash with key % size
are odd, so you never hash into half the buckets, wasting space and degrading cache performance
you handle collisions with linked lists: that means you're always following at least one pointer away from the table's contiguous memory, which is slow, and for collisions you jump around in memory with each node in the list; using std::vector<int>
to store colliding values would cap the jumping at 1 memory area outside the hash table, or you could use closed hashing / open addressed and displacement lists to typically find the element in a nearby hash table bucket: my benchmarks have found that around an order of magnitude faster for similar int
values.
If your data is completely random it may be difficult to find a good constant for the modulo operation. If your data follows some sort of pattern you may want to try running through a bunch of candidate constants to see which one performs best on your data.
In this post I showed how such a large-scale test could be structured. In the end my hash table produced an average lookup in 1.5 comparisons with a worst case of 14. The table contained 16000 entries, roughly 2^14.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With