Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OneOfAType container -- storing one each of a given type in a container -- am I off base here?

Tags:

c++

I've got an interesting problem that's cropped up in a sort of pass based compiler of mine. Each pass knows nothing of other passes, and a common object is passed down the chain as it goes, following the chain of command pattern.

The object that is being passed along is a reference to a file.

Now, during one of the stages, one might wish to associate a large chunk of data, such as that file's SHA512 hash, which requires a reasonable amount of time to compute. However, since that chunk of data is only used in that specific case, I don't want all file references to need to reserve space for that SHA512. However, I also don't want other passes to have to recalculate the SHA512 hash over and over again. For example, someone might only accept files which match a given list of SHA512s, but they don't want that value printed when the file reference gets to the end of the chain, or perhaps they want both, or... .etc.

What I need is some sort of container which contain only one of a given type. If the container does not contain that type, it needs to create an instance of that type and store it somehow. It's basically a dictionary with the type being the thing used to look things up.

Here's what I've gotten so far, the relevant bit being the FileData::Get<t> method:

class FileData;
// Cache entry interface
struct FileDataCacheEntry
{
    virtual void Initalize(FileData&)
    {
    }
    virtual ~FileDataCacheEntry()
    {
    }
};

// Cache itself
class FileData
{
    struct Entry
    {
        std::size_t identifier;
        FileDataCacheEntry * data;
        Entry(FileDataCacheEntry *dataToStore, std::size_t id)
            : data(dataToStore), identifier(id)
        {
        }
        std::size_t GetIdentifier() const
        {
            return identifier;
        }
        void DeleteData()
        {
            delete data;
        }
    };
    WindowsApi::ReferenceCounter refCount;
    std::wstring fileName_;
    std::vector<Entry> cache;
public:
    FileData(const std::wstring& fileName) : fileName_(fileName)
    {
    }
    ~FileData()
    {
        if (refCount.IsLastObject())
            for_each(cache.begin(), cache.end(), std::mem_fun_ref(&Entry::DeleteData));
    }
    const std::wstring& GetFileName() const
    {
        return fileName_;
    }

    //RELEVANT METHOD HERE
    template<typename T>
    T& Get()
    {
        std::vector<Entry>::iterator foundItem = 
            std::find_if(cache.begin(), cache.end(), boost::bind(
            std::equal_to<std::size_t>(), boost::bind(&Entry::GetIdentifier, _1), T::TypeId));
        if (foundItem == cache.end())
        {
            std::auto_ptr<T> newCacheEntry(new T);
            Entry toInsert(newCacheEntry.get(), T::TypeId);
            cache.push_back(toInsert);
            newCacheEntry.release();
            T& result = *static_cast<T*>(cache.back().data);
            result.Initalize(*this);
            return result;
        }
        else
        {
            return *static_cast<T*>(foundItem->data);
        }
    }
};

// Example item you'd put in cache
class FileBasicData : public FileDataCacheEntry
{
    DWORD    dwFileAttributes;
    FILETIME ftCreationTime;
    FILETIME ftLastAccessTime;
    FILETIME ftLastWriteTime;
    unsigned __int64 size;
public:
    enum
    {
        TypeId = 42
    }
    virtual void Initialize(FileData& input)
    {
        // Get file attributes and friends...
    }
    DWORD GetAttributes() const;
    bool IsArchive() const;
    bool IsCompressed() const;
    bool IsDevice() const;
    // More methods here
};

int main()
{
    // Example use
    FileData fd;
    FileBasicData& data = fd.Get<FileBasicData>();
    // etc
}

For some reason though, this design feels wrong to me, namely because it's doing a whole bunch of things with untyped pointers. Am I severely off base here? Are there preexisting libraries (boost or otherwise) which would make this clearer/easier to understand?

like image 395
Billy ONeal Avatar asked Jul 11 '10 01:07

Billy ONeal


2 Answers

As ergosys said already, std::map is the obvious solution to your problem. But I can see you concerns with RTTI (and the associated bloat). As a matter of fact, an "any" value container does not need RTTI to work. It is sufficient to provide a mapping between a type and an unique identifier. Here is a simple class that provides this mapping:

#include <stdexcept>
#include <boost/shared_ptr.hpp>
class typeinfo
{
    private:
        typeinfo(const typeinfo&); 
        void operator = (const typeinfo&);
    protected:
        typeinfo(){}
    public:
        bool operator != (const typeinfo &o) const { return this != &o; }
        bool operator == (const typeinfo &o) const { return this == &o; }
        template<class T>
        static const typeinfo & get()
        {
            static struct _ti : public typeinfo {} _inst;
            return _inst;
        }
};

typeinfo::get<T>() returns a reference to a simple, stateless singleton which allows comparisions.

This singleton is created only for types T where typeinfo::get< T >() is issued anywhere in the program.

Now we are using this to implement a top type we call value. value is a holder for a value_box which actually contains the data:

class value_box
{
    public:
        // returns the typeinfo of the most derived object
        virtual const typeinfo& type() const =0;
        virtual ~value_box(){}
};

template<class T>
class value_box_impl : public value_box
{
    private:
        friend class value;
        T m_val; 
        value_box_impl(const T &t) : m_val(t) {}
        virtual const typeinfo& type() const
        {
            return typeinfo::get< T >();
        }
};
// specialization for void.
template<>
class value_box_impl<void> : public value_box
{
    private:
        friend class value_box;
        virtual const typeinfo& type() const
        {
            return typeinfo::get< void >();
        }
    // This is an optimization to avoid heap pressure for the 
    // allocation of stateless value_box_impl<void> instances:
    void* operator new(size_t) 
    {
        static value_box_impl<void> inst;
        return &inst;
    }
    void operator delete(void* d) 
    {
    }

};

Here's the bad_value_cast exception:

class bad_value_cast : public std::runtime_error
{
    public:
        bad_value_cast(const char *w="") : std::runtime_error(w) {}
};

And here's value:

class value
{
    private:
        boost::shared_ptr<value_box> m_value_box;       
    public:
        // a default value contains 'void'
        value() : m_value_box( new value_box_impl<void>() ) {}          
            // embedd an object of type T.
        template<class T> 
        value(const T &t) : m_value_box( new value_box_impl<T>(t) ) {}
        // get the typeinfo of the embedded object
        const typeinfo & type() const {  return m_value_box->type(); }
        // convenience type to simplify overloading on return values
        template<class T> struct arg{};
        template<class T>
        T convert(arg<T>) const
        {
            if (type() != typeinfo::get<T>())
                throw bad_value_cast(); 
            // this is safe now
            value_box_impl<T> *impl=
                      static_cast<value_box_impl<T>*>(m_value_box.get());
            return impl->m_val;
        }
        void convert(arg<void>) const
        {
            if (type() != typeinfo::get<void>())
                throw bad_value_cast(); 
        }
};

The convenient casting syntax:

template<class T>
T value_cast(const value &v) 
{
    return v.convert(value::arg<T>());
}

And that's it. Here is how it looks like:

#include <string>
#include <map>
#include <iostream>
int main()
{
    std::map<std::string,value> v;
    v["zero"]=0;
    v["pi"]=3.14159;
    v["password"]=std::string("swordfish");
    std::cout << value_cast<int>(v["zero"]) << std::endl;
    std::cout << value_cast<double>(v["pi"]) << std::endl;
    std::cout << value_cast<std::string>(v["password"]) << std::endl;   
}

The nice thing about having you own implementation of any is, that you can very easily tailor it to the features you actually need, which is quite tedious with boost::any. For example, there are few requirements on the types that value can store: they need to be copy-constructible and have a public destructor. What if all types you use have an operator<<(ostream&,T) and you want a way to print your dictionaries? Just add a to_stream method to box and overload operator<< for value and you can write:

std::cout << v["zero"] << std::endl;
std::cout << v["pi"] << std::endl;
std::cout << v["password"] << std::endl;

Here's a pastebin with the above, should compile out of the box with g++/boost: http://pastebin.com/v0nJwVLW

EDIT: Added an optimization to avoid the allocation of box_impl< void > from the heap: http://pastebin.com/pqA5JXhA

like image 154
Nordic Mainframe Avatar answered Oct 06 '22 01:10

Nordic Mainframe


You can create a hash or map of string to boost::any. The string key can be extracted from any::type().

like image 44
ergosys Avatar answered Oct 06 '22 01:10

ergosys