Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C - How to implement Set data structure?

Is there any tricky way to implement a set data structure (a collection of unique values) in C? All elements in a set will be of the same type and there is a huge RAM memory.

As I know, for integers it can be done really fast'N'easy using value-indexed arrays. But I'd like to have a very general Set data type. And it would be nice if a set could include itself.

like image 615
psihodelia Avatar asked Apr 13 '10 15:04

psihodelia


People also ask

Is there set data structure in C?

The C Programming language has many data structures like an array, stack, queue, linked list, tree, etc. A programmer selects an appropriate data structure and uses it according to their convenience.

What is set data structure?

A set is a data structure that stores unique elements of the same type in a sorted order. Each value is a key, which means that we access each value using the value itself. With arrays, on the other hand, we access each value by its position in the container (the index). Accordingly, each value in a set must be unique.

What does set data structure do in C ++ explain in short?

Introduction To Data Structures In C++ An Introductory Tutorial On Data Structures In C++. “Data structure can be defined as an organized collection of data that helps a program to access data efficiently and rapidly so that the entire program can function in an efficient manner. “

What are sets in C language?

Sets in C++ Sets are associative containers that store unique elements. A stored element must be unique because it is identified with the value itself. Once the elements are inserted in the set, they cannot be modified; however, they can be inserted or removed from the container.


2 Answers

There are multiple ways of implementing set (and map) functionality, for example:

  • tree-based approach (ordered traversal)
  • hash-based approach (unordered traversal)

Since you mentioned value-indexed arrays, let's try the hash-based approach which builds naturally on top of the value-indexed array technique.

Beware of the advantages and disadvantages of hash-based vs. tree-based approaches.

You can design a hash-set (a special case of hash-tables) of pointers to hashable PODs, with chaining, internally represented as a fixed-size array of buckets of hashables, where:

  • all hashables in a bucket have the same hash value
  • a bucket can be implemented as a dynamic array or linked list of hashables
  • a hashable's hash value is used to index into the array of buckets (hash-value-indexed array)
  • one or more of the hashables contained in the hash-set could be (a pointer to) another hash-set, or even to the hash-set itself (i.e. self-inclusion is possible)

With large amounts of memory at your disposal, you can size your array of buckets generously and, in combination with a good hash method, drastically reduce the probability of collision, achieving virtually constant-time performance.

You would have to implement:

  • the hash function for the type being hashed
  • an equality function for the type being used to test whether two hashables are equal or not
  • the hash-set contains/insert/remove functionality.

You can also use open addressing as an alternative to maintaining and managing buckets.

like image 97
vladr Avatar answered Sep 22 '22 05:09

vladr


Sets are usually implemented as some variety of a binary tree. Red black trees have good worst case performance.

These can also be used to build an map to allow key / value lookups.

This approach requires some sort of ordering on the elements of the set and the key values in a map.

I'm not sure how you would manage a set that could possibly contain itself using binary trees if you limit set membership to well defined types in C ... comparison between such constructs could be problematic. You could do it easily enough in C++, though.

like image 22
andand Avatar answered Sep 22 '22 05:09

andand