Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Safe, efficient underlying data type for simple virtual machine

Tags:

c++

A while ago I created a simple simulated computer. It had peripherals, a screen buffer that could be rendered to an OpenGL texture, and a few other neat features. It works, works well, and on the whole I'm quite happy with it.

Except, I cheated.

The underlying data type is a union of integer, float and an instruction type (split into bit fields).

For any correct (simulated) program, the union is always used safely, only ever reading from the last union member written to. However, the potential that a badly formed program (e.g. loaded from a simulated harddrive) might access the members out of order could expose me to the usual problems associated with union abuse:

  • The possibility that a write could be optimized away at compile time - the compiler couldn't possibly have enough information to attempt this optimization
  • The value read from the union could be garbage - this is perfectly acceptable behaviour to me.
  • A float read in this way could be a signaling-NaN/trap-value - this is a real problem - crashing the simulated computer is fine, but crashing the real program is a disaster.
  • It's technically undefined behaviour, so although it probably won't, it could set the computer on fire, erase my hard-drive or summon Cthulhu.

Solutions considered:

  • Sticking with the union - maybe it's sufficiently well defined for all real world platforms? Maybe there are ways to sanitize the sNaNs?
  • Tagged union - would effectively cut memory allowance in half
  • Separately stored array of efficiently packed tags - a little fiddly propagating the tag, but otherwise somewhat viable.
  • char array - seems simple, but the costs of doing it safely, allowing for a read from a type different to the one that was written, really add up.
  • Integer type - as above for float and instruction, with the difference that integers are trivial.
  • char array plus separate integer and float registers - characterful and in many ways ideal, but would require me to write a compiler that could use these effectively.

I imagine that this is the kind of project that many SO users have attempted at one time or another, so problem-specific experience is especially welcome.

like image 700
DeveloperInDevelopment Avatar asked Oct 19 '22 02:10

DeveloperInDevelopment


1 Answers

If your compiler supports it, you could use C++17 std::variant (based on boost::variant).


Edit: For maximally space-efficient, opt-in type safety, you could do something along the lines of

union Word { int32_t i; float f; Instruction inst; };

namespace MemAccess
{
        static std::bitset<MEM_SIZE> int32_whitelist,
                                     float_whitelist,
                                     inst_whitelist;
        static std::array<Word, MEM_SIZE> memory;
        // set or reinterpret as int32
        int32_t &
        int32_at(const size_t at)
        {
                int32_whitelist[at] = 1;
                float_whitelist[at] = inst_whitelist[at] = 0;

                return memory[at].i;
        }
        // interpret as int32 only if whitelisted
        int32_t &
        int32_checked(const size_t at)
        {
                if (int32_whitelist[at])
                {
                        return memory[at].i;
                }
                else
                {
                        throw;
                }
        }
        // equivalent functions for floats and instructions
}

Edit 2: Occurred to me this could also be done with one bitset.

static std::array<Word, MEM_SIZE> memory;
static std::bitset<MEM_SIZE * 2> whitelist;

float &
float_at(const size_t at)
{       // None = 00, Inst = 10, Int32 = 11
        whitelist[at * 2]     = 0;
        whitelist[at * 2 + 1] = 1;

        return memory[at].f;
}

float &
float_checked(const size_t at)
{
        if (!whitelist[at * 2] && whitelist[at * 2 + 1])
        {
                return memory[at].f;
        }

        throw;
}
like image 162
Ray Hamel Avatar answered Oct 21 '22 04:10

Ray Hamel