I am an accomplished C programmer, and I have written an assembler and VM in C (https://github.com/chucktilbury/assembler (mostly working but not complete)) and I am thinking of porting it to C++ so I can use the STL for obvious reasons. This is a hobby project with no expectations of actually being useful to anyone.
My VM has a notion of a Value, which can have exactly one of several types. I have implemented it using a struct with a union embedded in it. This has led to a bunch of huge macros to do simple things like addition and comparisons.
What I would like to do is have a base class that can be used to reference a Value and child classes that implement the specific type so that I don't have to spell out exactly how to retrieve the data that the Value implements.
In other words, This is what I have:
...
switch(left_val->type) {
...
case FLOAT:
switch(right_val->type) {
...
case INTEGER:
result->type = FLOAT;
result->data.float_val = left_val->data.float_val + right_val->data.int_val;
break;
...
}
...
}
What I would like to see:
result->add(left_val, right_val); // over-simplified, I know....
Obviously, I can do this like I did it in C, and that's my knee-jerk reaction. But I feel that I am missing some major point of how this should work in C++.
As I understand it, there is no way to achieve what I want because C++ (like C) is a statically typed language. I may as well just encapsulate the ugly in a single class and just pass around Value(s) as I am currently doing in C.
Is there general agreement with that statement? Or am I totally wrong?
I suggest using a std::variant and std::visit to reduce the boilerplate.
Example:
#include <cstdint>
#include <variant>
using INTEGER = std::intmax_t;
using FLOAT = long double;
using Value = std::variant<INTEGER, FLOAT>;
auto operator+(const Value& lhs, const Value& rhs) {
return std::visit([](auto&& l, auto&& r) -> Value { return l + r; }, lhs, rhs);
}
Now adding two INTEGERs would produce a variant holding an INTEGER. Involving a FLOAT in the addition would make the result FLOAT, just like in the normal rules of the language.
Example usage:
int main() {
Value i = 10;
Value f = 3.14159;
auto r = i + f;
}
Demo
A C++20 version (kindly provided by user17732522):
auto operator+(const Value& lhs, const Value& rhs) {
return std::visit<Value>(std::plus{}, lhs, rhs);
}
A more generic version supporting mixing non-types and user-defined types will not use any fundamental (or std::) types directly in the variant:
template<class... Ts>
std::variant<Ts...> operator+(const std::variant<Ts...>& lhs,
const std::variant<Ts...>& rhs) {
return std::visit<std::variant<Ts...>>(std::plus{}, lhs, rhs);
}
Then create user-defined types and define Value as a variant that can hold one of those types. Here's an example with a few of the types you mentioned in the comments:
struct Error {};
struct Number {};
struct String {};
struct Hash {};
using Value = std::variant<Error, Number, String, Hash>;
If the types have something in common, you could also add inheritance, but for now, this will suffice.
You must also decide what should happen for all combinations involving these types. I'd create a table for all operators you aim to support to think it through properly before implementing it. It could look like this for operator+:
| ↓ lhs + rhs → | Number | String | Hash |
|---|---|---|---|
| Number | Number | String | Error |
| String | String | String | Error |
| Hash | Error | Error | Hash |
I've left the Error row and column out because anything involving an operation with Error should most likely result in Error. If you disagree, add Error to the table too.
Now implement the overloads that satisfies this table. The below can most likely be refined using more templates to cover multiple cases but this shows overloads that needs to exist one way or another:
// if at least one operand is Error, return Error
Error operator+(const Error&, const Error&) { return {}; }
template<class T> Error operator+(const Error&, const T&) { return {}; }
template<class T> Error operator+(const T&, const Error&) { return {}; }
// valid Number operations:
Number operator+(const Number&, const Number&) { /*impl*/ }
String operator+(const Number&, const String&) { /*impl*/ }
// anything else return Error
Error operator+(const Number&, const Hash&) { return {}; }
// valid String operations:
String operator+(const String&, const String&) { /*impl*/ }
String operator+(const String&, const Number&) { /*impl*/ }
// anything else return Error
Error operator+(const String&, const Hash&) { return {}; }
// valid Hash operations:
Hash operator+(const Hash&, const Hash&) { /*impl*/ }
// anything else return Error
Error operator+(const Hash&, const Number&) { return {}; }
Error operator+(const Hash&, const String&) { return {}; }
If you forget one overload, your compiler will tell you - which is very nice. The visitor in the operator+ taking variants needs the table of operators to call to be exhaustive.
Then repeat for the other operators, with std::visitors using std::minus, std::multiplies and std::divides. It's tedious work but you do not need to implement the actual logic to lookup the correct overload. The visitors will build the complete table for you and you'll only have to implement one operator overload at a time according to the matrices for the operators you've made.
Extending it is also pretty straight forwad. You add the new type to your operator matrices and to Value and then implement the missing operators. Again, the compiler will be most helpful and point out if you miss anything.
One way to reduce some of the implementation work could be to make some types convertible to others. If for example a String + Number should result in converting the Number to String and then performing the String + String operation, you could do this:
struct Error {};
struct Number { };
struct String {
String() = default;
String(const Number&) {} // converting constructor
};
struct Hash {};
Now these implementations can simply be removed:
String operator+(const Number&, const String&) { /*impl*/ }
String operator+(const String&, const Number&) { /*impl*/ }
Thinking about such valid conversions may reduce the workload considerably.
I have something very similar written in C++. I am assuming you have no expectation of performance. Here are some keypoints that might help:
You will need a way to store the type information for every piece of data. Instead of storing data type in an enum, you should use a class for this purpose. If you want to be able to support embedding additional types, your type system should be robust. Since I was aiming for an object oriented language, I have included data members and member functions (including operators and constructors) for types.
If you want to go built-in types only, you may register objects/classes to handle operators per type. Thus, once you have obtained a piece of data, you can get its type and invoke the necessary function. You may do this system using polymorphism but everytime you copy a the data, you will also copy the type. Therefore, you need to have a function that will clone your object properly. You could also go simpler route, but this will take more memory.
Type IntType() {
Type t;
t.addition = &add<int>;
//...
return t;
}
You can also create only one IntType and then distribute pointers of it for each data. Something similar to a vftable.
To actually store data, I have used my Any structure, it is very similar to std::any (I wrote this around 2014). You may use std::any for the data store. You have to be careful about types that cannot be copied.
If you want to see the project yourself, it is publically available at here. You should get 4.x-dev branch. The system is in Source/Scripting. Good luck.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With