Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

hash table for strings in c++

i've done in the past a small exercise about hashtable but the user was giving array size and also the struct was like this (so the user was giving a number and a word each time as input)

struct data
{
   int key;
   char c[20];
};

So it was quite simple since i knew the array size and also the user was saying how many items he will be give as input. The way i did it was

  • Hashing the keys the user gave me
  • find the position array[hashed(key)] in the array
  • if it was empty i would put the data there
  • if it wasn't i would put it in the next free position i would find.

But now i have to make inverted index and i am reasearching so i can make a hashtable for it. So the words will be collected from around 30 txts and they will be so many. So in this case how long should the array be? How can i hash words? Should i use hasing with open adressing or with chaining. The exercise sais that we could use a hash table as it is if we find it online. but i prefer to understand and create it by my own. Any clues will help me :)

In this exerice(inverted index using hash table) the structs looks like this. data type is the type of the hash table i will create.

struct posting
{
    string word;
    posting *next
}

struct data
{
    string word;
    posting *ptrpostings;
    data *next
};
like image 540
captain monk Avatar asked Apr 15 '13 11:04

captain monk


1 Answers

Hashing can be done anyway you choose. Suppose that the string is ABC. You can employ hashing as A=1, B=2, C=3, Hash = 1+2+3/(length = 3) = 2. But, this is very primitive.

The size of the array will depend on the hash algorithm that you deploy, but it is better to choose an algorithm that returns a definite length hash for every string. For example, if you chose to go with SHA1, you can safely allocate 40 bytes per hash. Refer Storing SHA1 hash values in MySQL Read up on the algorithm http://en.wikipedia.org/wiki/SHA-1. I believe that it can be safely used.

On the other hand, if it just for a simple exercise, you can also use MD5 hash. I wouldn't recommend using it in practical purposes as its rainbow tables are available easily :)

---------EDIT-------

You can try to implement like this ::

#include <iostream>
#include <string>
#include <stdlib.h>
#include <stdio.h>
#define MAX_LEN 30
using namespace std;

typedef struct
{
    string name; // for the filename
    ... change this to your specification
}hashd;

hashd hashArray[MAX_LEN]; // tentative

int returnHash(string s)
{
    // A simple hashing, no collision handled
    int sum=0,index=0;
    for(string::size_type i=0; i < s.length(); i++)
    {
        sum += s[i];
    }
    index = sum % MAX_LEN;
    return index;
}

int main()
{
    string fileName;
    int index;
    cout << "Enter filename        ::\t" ;
    cin >> fileName;
    cout << "Enter filename is     ::\t" + fileName << "\n";
    index = returnHash(fileName);
    cout << "Generated index is ::\t" << index << "\n";
    hashArray[index].name = fileName;
    cout << "Filename in array    ::\t" <<hashArray[index].name ;
    return 0;
}

Then, to achieve O(1), anytime you want to fetch the filename's contents, just run the returnHash(filename) function. It will directly return the index of the array :)

like image 199
Binayaka Chakraborty Avatar answered Sep 29 '22 06:09

Binayaka Chakraborty