Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which data structure to use for huge but constant dictionary in C++

I have to use a huge dictionary with integer (or enum) keys and string values. But this is totally constant. No way to change in runtime. Is there a way (using templates etc.) to retrieve dictionary data at compile time instead of using existing dictionary structure?

like image 646
Gulshan Avatar asked Sep 17 '11 10:09

Gulshan


People also ask

Which data structure is best for representing a dictionary?

Dictionaries are often implemented as hash tables. Each element stored in a dictionary is identified by a key of type K. Dictionary represents a mapping from keys to values. Dictionaries have numerous applications.

Which data structure would you use to implement dictionary?

Hashing is one simple option for this. We can put all words in a hash table.

Which data structure is commonly used to store a dictionary?

Hash Table. Hashing is a process used to uniquely identify objects and store each object at some pre-calculated unique index called its “key.” So, the object is stored in the form of a “key-value” pair, and the collection of such items is called a “dictionary.” Each object can be searched using that key.

Which data structure can you use as keys for your dictionary?

Typically, the keys in a dictionary must be simple types (such as integers or strings) while the values can be of any type. Different languages enforce different type restrictions on keys and values in a dictionary. Dictionaries are often implemented as hash tables.


2 Answers

Clang and LLVM have solved your issue by generating tables containing their objects, using a combination of code generation and preprocessor trickery.

You can skip either step, depending on your own setup. For example:

// records.inc
EXPAND_RECORD(Foo, "Foo", 4);
EXPAND_RECORD(Bar, "Bar", 18);
EXPAND_RECORD(Bar2, "Bar", 19);

Now, you can generate your enum:

// records.h
enum Record {

#define EXPAND_RECORD(Name, String, Value) Name,
#include "records.inc"
#undef EXPAND_RECORD

};

char const* getRecordName(Record r);
int getRecordValue(Record r);

// records.cpp

char const* getRecordName(Record r) {
  switch(r) {
#define EXPAND_RECORD(Name, String, Value) case Name: return String;
#include "records.inc"
#undef EXPAND_RECORD
  }

  abort(); // unreachable, or you can return a "null" value
}

int getRecordValue(Record r) {
  switch(r) {
#define EXPAND_RECORD(Name, String, Value) case Name: return Value;
#include "records.inc"
#undef EXPAND_RECORD
  }

  abort(); // unreachable, or you can return a "null" value
}

In Clang and LLVM, a code generation phase is used to generate the .inc from more pleasant definition files.

It works pretty well... but do be aware that any modification of the enum implies full recompilation. You might wish to go to a "codeset" approach, where the enum is used internally but never leaked outside, and stable values (those of the enum) are provided to the client (unsigned), so that old clients can link to the new libraries without recompilation: they will be limited to use the old set of codes, which is no problem if it's stable.

like image 120
Matthieu M. Avatar answered Sep 29 '22 22:09

Matthieu M.


Surely you can simply use sed to transform the dictionary into a string constant indexed by template parameter, with a header file like:

template <int Index> struct Dictionary { static const char *entry; };

and a source file with many lines of the form:

template <> const char *Dictionary<5>::entry = "Entry for five";

On the other hand, do you really want to do this from a maintenance perspective? It entails recompilation for every changed dictionary entry and bloated executable sizes.

like image 43
thiton Avatar answered Sep 29 '22 21:09

thiton