Is there any tricky way to implement a set data structure (a collection of unique values) in C? All elements in a set will be of the same type and there is a huge RAM memory. As I know, for integers it can be done really fast'N'easy using value-indexed arrays. But I'd like to have a very general Set data type. And it would be nice if a set could include itself.

There are multiple ways of implementing set (and map) functionality, for example: <ul> <li>tree-based approach (ordered traversal)</li> <li>hash-based approach (unordered traversal)</li> </ul> Since you mentioned value-indexed arrays, let's try the hash-based approach which builds naturally on top of the value-indexed array technique. Beware of the advantages and disadvantages of hash-based vs. tree-based approaches. You can design a hash-set (a special case of hash-tables) of pointers to hashable PODs, with chaining, internally represented as a fixed-size array of buckets of hashables, where: <ul> <li>all hashables in a bucket have the same hash value</li> <li>a bucket can be implemented as a dynamic array or linked list of hashables </li> <li>a hashable's hash value is used to index into the array of buckets (hash-value-indexed array)</li> <li>one or more of the hashables contained in the hash-set could be (a pointer to) another hash-set, or even to the hash-set itself (i.e. self-inclusion is possible)</li> </ul> With large amounts of memory at your disposal, you can size your array of buckets generously and, in combination with a good hash method, drastically reduce the probability of collision, achieving virtually constant-time performance. You would have to implement: <ul> <li>the hash function for the type being hashed</li> <li>an equality function for the type being used to test whether two hashables are equal or not</li> <li>the hash-set <code>contains</code>/<code>insert</code>/<code>remove</code> functionality.</li> </ul> You can also use open addressing as an alternative to maintaining and managing buckets.

Sets are usually implemented as some variety of a binary tree. Red black trees have good worst case performance. These can also be used to build an map to allow key / value lookups. This approach requires some sort of ordering on the elements of the set and the key values in a map. I'm not sure how you would manage a set that could possibly contain itself using binary trees if you limit set membership to well defined types in C ... comparison between such constructs could be problematic. You could do it easily enough in C++, though.

C - How to implement Set data structure?

2 Answers

There are multiple ways of implementing set (and map) functionality, for example:

tree-based approach (ordered traversal)
hash-based approach (unordered traversal)

Since you mentioned value-indexed arrays, let's try the hash-based approach which builds naturally on top of the value-indexed array technique.

Beware of the advantages and disadvantages of hash-based vs. tree-based approaches.

You can design a hash-set (a special case of hash-tables) of pointers to hashable PODs, with chaining, internally represented as a fixed-size array of buckets of hashables, where:

all hashables in a bucket have the same hash value
a bucket can be implemented as a dynamic array or linked list of hashables
a hashable's hash value is used to index into the array of buckets (hash-value-indexed array)
one or more of the hashables contained in the hash-set could be (a pointer to) another hash-set, or even to the hash-set itself (i.e. self-inclusion is possible)

With large amounts of memory at your disposal, you can size your array of buckets generously and, in combination with a good hash method, drastically reduce the probability of collision, achieving virtually constant-time performance.

You would have to implement:

the hash function for the type being hashed
an equality function for the type being used to test whether two hashables are equal or not
the hash-set contains/insert/remove functionality.

You can also use open addressing as an alternative to maintaining and managing buckets.

answered Sep 22 '22 05:09

vladr

Sets are usually implemented as some variety of a binary tree. Red black trees have good worst case performance.

These can also be used to build an map to allow key / value lookups.

This approach requires some sort of ordering on the elements of the set and the key values in a map.

I'm not sure how you would manage a set that could possibly contain itself using binary trees if you limit set membership to well defined types in C ... comparison between such constructs could be problematic. You could do it easily enough in C++, though.

answered Sep 22 '22 05:09

andand

Related questions
                            
                                How clear gdb command screen?
                            
                                Cumulative Normal Distribution Function in C/C++
                            
                                C: How do you declare a recursive mutex with POSIX threads?
                            
                                Printing leading zeroes for hexadecimal in C
                            
                                How define an array of function pointers in C
                            
                                undefined reference to curl_global_init, curl_easy_init and other function(C)
                            
                                How do I convert a Python list into a C array by using ctypes?
                            
                                Determining to which function a pointer is pointing in C?
                            
                                Zero an array in C code [duplicate]
                            
                                How to add two numbers without using ++ or + or another arithmetic operator
                            
                                Can XOR of two integers go out of bounds?
                            
                                Java - C-Like Fork?
                            
                                Writing a "real" interactive terminal program like vim, htop, ... in C/C++ without ncurses
                            
                                Anonymous functions using GCC statement expressions
                            
                                How do I print a #defined constant in GDB?
                            
                                Declaring and initializing arrays in C
                            
                                Why memory functions such as memset, memchr... are in string.h, but not in stdlib.h with another mem functions?
                            
                                How does malloc work in a multithreaded environment?
                            
                                Light C Unicode Library [closed]
                            
                                Difference between C/C++ Runtime Library and C/C++ Standard Library

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

C - How to implement Set data structure?

Tags:

c

algorithm

math

data-structures

set

psihodelia

People also ask

2 Answers

vladr

andand

Recent Activity

Donate For Us