Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding An Alternative To Abusing Enums

In a project I've been helping with recently, the entire code base depends on a monstrous enum that's effectively used as keys for a glorified Hash Table. The only problem is now that it is HUGE, compiling whenever the enum changes is basically a rebuild for an already large code base. This takes forever and I would really LOVE to replace it.

enum Values
{
    Value = 1,
    AnotherValue = 2,
    <Couple Thousand Entries>
    NumValues // Sentinel value for creating arrays of the right size
}

What I'm looking for is ways to replace this enum but still have a system that is typesafe (No unchecked strings) and also compatible with MSVC2010 (no constexpr). Extra compiling overhead is acceptable as it might still be shorter time to compile than recompiling a bunch of files.

My current attempts can basically be summed up as delaying defining the values until link time.

Examples of its use

GetValueFromDatabase(Value);
AddValueToDatabase(Value, 5);
int TempArray[NumValues];

Edit: Compiletime and Runtime preprocessing is acceptable. Along with basing it off some kind of caching data structure at runtime.

like image 219
BlamKiwi Avatar asked Jan 01 '15 08:01

BlamKiwi


1 Answers

One way you can achieve this is with a key class that wraps the numeric ID and which cannot be directly instantiated, therefore forcing references to be done through a type-safe variable:

// key.h

namespace keys {

// Identifies a unique key in the database
class Key {
  public:
    // The numeric ID of the key
    virtual size_t id() const = 0;
    // The string name of the key, useful for debugging
    virtual const std::string& name() const = 0;
};

// The total number of registered keys
size_t count();

// Internal helpers. Do not use directly outside this code.
namespace internal {
  // Lazily allocates a new instance of a key or retrieves an existing one.
  const Key& GetOrCreate(const std::string& name, size_t id);
}
}

#define DECLARE_KEY(name) \
   extern const ::keys::Key& name

#define DEFINE_KEY(name, id) \
   const ::keys::Key& name = ::keys::internal::GetOrCreate(STRINGIFY(name), id)

With the code above, the definition of keys would look like this:

 // some_registration.h
 DECLARE_KEY(Value);
 DECLARE_KEY(AnotherValue);
 // ...

 // some_registration.cpp
 DEFINE_KEY(Value, 1);
 DEFINE_KEY(AnotherValue, 2);
 // ...

Importantly, the registration code above could now be split into several separate files, so that you do not need to recompile all the definitions at once. For example, you could break apart the registration into logical groupings, and if you added a new entry, only on the one subset would need to be recompiled, and only code that actually depended on the corresponding *.h file would need to be recompiled (other code that didn't reference that particular key value would no longer need to be updated).

The usage would be very similar to before:

 GetValueFromDatabase(Value);
 AddValueToDatabase(Value, 5);
 int* temp = new int[keys::count()];

The corresponding key.cpp file to accomplish this would look like this:

namespace keys {
namespace {
class KeyImpl : public Key {
  public:
    KeyImpl(const string& name, size_t id) : id_(id), name_(name) {}
    ~KeyImpl() {}
    virtual size_t id() const { return id_; }
    virtual const std::string& name() const { return name_; }

  private:
    const size_t id_;
    const std::string name_;
};

class KeyList {
  public:
    KeyList() {}
    ~KeyList() {
      // This will happen only on program termination. We intentionally
      // do not clean up "keys_" and just let this data get cleaned up
      // when the entire process memory is deleted so that we do not
      // cause existing references to keys to become dangling.
    }

    const Key& Add(const string& name, size_t id) {
       ScopedLock lock(&mutex_);
       if (id >= keys_.size()) {
         keys_.resize(id + 1);
       }

       const Key* existing = keys_[id]
       if (existing) {
         if (existing->name() != name) {
            // Potentially some sort of error handling
            // or generation here... depending on the
            // desired semantics, for example, below
            // we use the Google Log library to emit
            // a fatal error message and crash the program.
            // This crash is expected to happen at start up.
            LOG(FATAL) 
               << "Duplicate registration of key with ID "
               << id << " seen while registering key named "
               << "\"" << name << "\"; previously registered "
               << "with name \"" << existing->name() << "\".";
         }
         return *existing;
       }

       Key* result = new KeyImpl(name, id);
       keys_[id] = result;
       return *result;
    }

    size_t length() const {
       ScopedLock lock(&mutex_);
       return keys_.size();
    }
  private:
    std::vector<const Key*> keys_;
    mutable Mutex mutex_;
};

static LazyStaticPtr<KeysList> keys_list;
}

size_t count() {
  return keys_list->length();
}

namespace internal {
  const Key& GetOrCreate(const std::string& name, size_t id) {
    return keys_list->Add(name, id);
  }
}
}

As aptly noted in the comments below, one drawback with an approach that allows for decentralized registration is that it then becomes possible to get into conflict scenarios where the same value is used multiple times (the example code above adds an error for this case, but this occurs at runtime, when it would be really nice to surface such a thing at compile time). Some ways to mitigate this include commit hooks that run tests checking for such a condition or policies on how to select the ID value that reduce the likelihood of reusing an ID, such as a file that indicates the next available ID that must be incremented and submitted as a way to allocate IDs. Alternatively, assuming that you are permitted to reshuffle the IDs (I assumed in this solution that you must preserve the current IDs that you already have), you could change the approach so that the numeric ID is automatically generated from the name (e.g. by taking a hash of the name) and possibly use other factors such as __FILE__ to deal with collisions so that IDs are unique.

like image 50
Michael Aaron Safyan Avatar answered Sep 30 '22 15:09

Michael Aaron Safyan