Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to iterate through a hash table in C

Tags:

c

hashtable

Edit: I have added all files (.c, .h and a Makefile). The program works but has many memory leaks.

http://dl.dropbox.com/u/21926200/Ngrams.tar.gz

How do you iterate through a hash table to call free()? I am wondering if I need to create a separate function to call free() on htable and results or I can do it inside this code. Any ideas? I just need to call free() on htable and results.

h_ptr *htable;
int tsize;

void new_table(int size)
{
    tsize = size;
    htable = (h_ptr *) calloc(size, sizeof(h_ptr));
    if (!htable) {
    fprintf(stderr, "Couldn't allocate hash array, exiting\n");
    exit(1);
    }

    // At the beginning, each element in the hash table
    // is a NULL pointer
    for(int i=0; i<size; i++)
      {
    htable[i]=NULL;
      }
}

// Allocate new element to store in hash table
h_ptr new_element(char *s)
{
    h_ptr result = (h_ptr) malloc(sizeof(h_rec));
    int wlen = strlen(s);
    if (wlen > llen) {
    lstring = s;
    llen = wlen;
    lcnt = 1;
    } else if (wlen == llen)
    lcnt++;
    if (!result) {
    fprintf(stderr, "Couldn't allocate hash element, exiting\n");
    exit(1);
    }

    result->word = s;
    result->freq = 1;
    result->next = NULL; 
    return result;
}

Thanks

like image 404
Damon Avatar asked Sep 10 '25 15:09

Damon


1 Answers

Before the 'whole program' was provided

Given the information in the new_element() function:

result->word = s;
result->freq = 1;
result->next = NULL; 

we have to infer (since you didn't include the actual information) that your hash table is an array where the elements are linked lists of individually allocated elements containing an allocated name, a frequency and a pointer to the next item. Therefore, freeing the hash table involves visiting each element of the array in turn, chasing down the linked list, freeing the name and the element.

void free_hashtable(void)
{
    if (htable != 0)
    {
        for (int i = 0; i < tsize; i++)
        {
            h_ptr next = 0;
            for (h_ptr curr = htable[i]; curr != 0; curr = next)
            {
                next = curr->next;
                free(curr->word);
                free(curr);
            }
        }
        free(htable);
        htable = 0;
        tsize = 0;
    }
}

After the 'whole program' was provided

We still don't have the data structures, so we still can't really say what is going on. So, we can add this to the top of the code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct hash_table_entry
{
    int freq;
    struct hash_table_entry *next;
    char *word;
} *h_ptr;

However, one issue that arises when we add that is:

 h_ptr result = (h_ptr) malloc(sizeof(h_rec));

The compiler says:

error: 'h_rec' undeclared (first use in this function)

identifying that line. Now, it might be that your have some typedef like:

typedef struct hash_table_entry h_rec;

but we should not be having to guess. Please create an SSCCE (Short, Self-Contained, Correct Example) so we do not have to guess.

I compile the code using:

gcc -O3 -g -std=c99 -Wall -Wextra -Wmissing-prototypes -Wstrict-prototypes -Wold-style-definition ht.c -o ht 

The compiler warns me about:

ht.c:27:6: warning: no previous prototype for ‘lowercase’ [-Wmissing-prototypes]
ht.c:41:6: warning: no previous prototype for ‘new_table’ [-Wmissing-prototypes]
ht.c:59:7: warning: no previous prototype for ‘new_element’ [-Wmissing-prototypes]
ht.c:82:10: warning: no previous prototype for ‘hash_function’ [-Wmissing-prototypes]
ht.c:103:7: warning: no previous prototype for ‘save_string’ [-Wmissing-prototypes]
ht.c:116:7: warning: no previous prototype for ‘find_element’ [-Wmissing-prototypes]
ht.c:142:7: warning: no previous prototype for ‘sort_words’ [-Wmissing-prototypes]
ht.c:176:6: warning: no previous prototype for ‘insert_string’ [-Wmissing-prototypes]
ht.c:191:6: warning: no previous prototype for ‘init_token’ [-Wmissing-prototypes]
ht.c:201:7: warning: no previous prototype for ‘get_word’ [-Wmissing-prototypes]
ht.c:224:7: warning: no previous prototype for ‘get_token’ [-Wmissing-prototypes]
ht.c:276:6: warning: no previous prototype for ‘word_freq’ [-Wmissing-prototypes]
ht.c: In function ‘main’:
ht.c:322:5: warning: implicit declaration of function ‘add_string_option’ [-Wimplicit-function-declaration]
ht.c:323:5: warning: implicit declaration of function ‘add_int_option’ [-Wimplicit-function-declaration]
ht.c:328:5: warning: implicit declaration of function ‘parse_options’ [-Wimplicit-function-declaration]
ht.c:329:5: warning: implicit declaration of function ‘show_options’ [-Wimplicit-function-declaration]

And for the last four names, the linker fails to find them. We need code that doesn't use code that is not available. Reduce your code to an SSCCE (note self-contained!).

I've hacked out the calls to those functions and simplified main accordingly. When run on its own source (./ht < ht.c), it produces:

N-gram size 1
Running analysis... (This can take several minutes or more!)
 Initializing hash table...
 Inserting all n-grams into hash table in lowercase form...
 Sorting all hash table elements according to frequency...

Analysis Details:
(Top 10 list of n-grams)
82  '='
30  'int'
27  '0'
23  'if'
23  'char'
22  's'
16  '1'
16  'i'
14  'ls'
14  'h_ptr'

Analysis Summary:
817 total n-grams
211 unique n-grams
94 singleton n-grams (occur only once)
Most common n-gram (with 82 occurrences) is '='
Longest n-gram (1 have length 16) is 'hash_table_entry'
Total time = 0.002225 seconds

Looks OK...but run the optimized build under valgrind and another story shows up...but I've since concluded that the problem is in the optimized implementation of strlen() in the specific version of GCC I'm using (because, in part, using a different version of GCC on the very same code makes the errors go away, and because the errors were in a call to strlen() which has no business reading 4 bytes starting at offset 4 in an allocation of 5 bytes).

I added a function print_hashtable(). I think that for almost any non-trivial structure, it is worth having a 'print' or 'dump' function available like this.

static void print_hashtable(FILE *fp, const char *tag)
{
    fprintf(fp, "Print hash table: %s\n", tag);
    fprintf(fp, "Size = %d; Address: %p\n", tsize, htable);
    if (htable != 0)
    {
        for (int i = 0; i < tsize; i++)
        {
            printf("Entry %d (%p)\n", i, htable[i]);
            h_ptr next = 0;
            for (h_ptr curr = htable[i]; curr != 0; curr = next)
            {
                next = curr->next;
                printf("Word: %s (freq %d; next %p)\n", curr->word, curr->freq, next);
            }
        }
    }
}

I inserted a call to this just before sort_words(). This showed that despite the not dreadfully effective hash algorithm, the data structure was intact.

I also inserted a call to print_hashtable() after sort_words(), and that showed that the hash table was comprehensively destroyed by the sorting process. The sort phase eliminates the hash table as a hash table; the only thing to do is to release the table as whole (free(htable)). Freeing all the table entries has to be done at the foot of the function word_freq():

static void free_hashlist(h_ptr head)
{
    while (head != 0)
    {
        h_ptr next = head->next;
        printf("Free: %d (%s) %p -> %p\n", head->freq, head->word, (void *)head, (void *)next);
        free(head->word);
        free(head);
        head = next;
    }
}

static void word_freq(FILE *src, int details, int size)
{
    char *s;
    h_ptr list_head, ptr;

    printf(" Initializing hash table...\n");
    init_token(src);
    new_table(size);

    printf(" Inserting all n-grams into hash table in lowercase form...\n");
    while ((s = get_word()))
        insert_string(s);

    print_hashtable(stdout, "After reading data");

    printf(" Sorting all hash table elements according to frequency...\n");
    list_head = sort_words();

    if (details > 0)
    {
        printf("\nAnalysis Details:\n(Top %i list of n-grams)\n", details);
        ptr = list_head;
        while (ptr && details--)
        {
            printf("%d\t'%s'\n", ptr->freq, ptr->word);
            ptr = ptr->next;
        }
    }

    printf("\nAnalysis Summary:\n");
    printf("%d total n-grams\n", wcnt);
    printf("%d unique n-grams\n", ucnt);
    printf("%d singleton n-grams (occur only once)\n", scnt);
    printf("Most common n-gram (with %d occurrences) is '%s'\nLongest n-gram (%d have length %d) is '%s'\n",
            mcnt, mstring, lcnt, llen, lstring);

    free_hashlist(list_head);
}

And the free_hashtable() code has to be simplified:

static void free_hashtable(void)
{
    if (htable != 0)
    {
        free(htable);
        htable = 0;
        lcnt = 0;
        lstring = 0;
        mcnt = 0;
        mstring = 0;
        scnt = 0;
        tsize = 0;
        wcnt = 0;
        ucnt = 0;
    }
}

int main(void)
{
    printf("\nRunning analysis... (This can take several minutes or more!)\n");
    word_freq(stdin, 10, 1024);
    printf("Total time = %f seconds\n", (double) clock() / CLOCKS_PER_SEC);
    free_hashtable();
    return 0;
}

In this program, it's not critical to reset the pointers and counts to null/zero. Also, the lstring and mstring pointers are invalidated by free_hashlist(), so arguably those should be nulled/zeroed in that function.

This runs leak-free for me, and with only system-allocated memory still in use at the end.

like image 68
Jonathan Leffler Avatar answered Sep 13 '25 11:09

Jonathan Leffler