"Introduction"
I'm relatively new to C++. I went through all the basic stuff and managed to build 2-3 simple interpreters for my programming languages.
The first thing that gave and still gives me a headache: Implementing the type system of my language in C++
Think of that: Ruby, Python, PHP and Co. have a lot of built-in types which obviously are implemented in C. So what I first tried was to make it possible to give a value in my language three possible types: Int, String and Nil.
I came up with this:
enum ValueType
{
Int, String, Nil
};
class Value
{
public:
ValueType type;
int intVal;
string stringVal;
};
Yeah, wow, I know. It was extremely slow to pass this class around as the string allocator had to be called all the time.
Next time I've tried something similar to this:
enum ValueType
{
Int, String, Nil
};
extern string stringTable[255];
class Value
{
public:
ValueType type;
int index;
};
I would store all strings in stringTable
and write their position to index
. If the type of Value
was Int
, I just stored the integer in index
, it wouldn't make sense at all using an int index to access another int, or?
Anyways, the above gave me a headache too. After some time, accessing the string from the table here, referencing it there and copying it over there grew over my head - I lost control. I had to put the interpreter draft down.
Now: Okay, so C and C++ are statically typed.
How do the main implementations of the languages mentioned above handle the different types in their programs (fixnums, bignums, nums, strings, arrays, resources,...)?
What should I do to get maximum speed with many different available types?
How do the solutions compare to my simplified versions above?
Solution 2Numbers of two different type can not be added directly. You will have to convert one of the numbers to the 'other' type and do the addition.
Main types. The C language provides the four basic arithmetic type specifiers char, int, float and double, and the modifiers signed, unsigned, short, and long.
The data types in C can be classified as follows:Arithmetic types can be further classified into integer and floating data types. Void Types. The data type has no value or operator and it does not provide a result to its caller. But void comes under Primitive data types. User Defined DataTypes.
The C language specifies two of the primary data types for storing the fractional numbers or the floating-point. These are double or float.
There are a couple of different things that you can do here. Different solutions have come up in time, and most of them require dynamic allocation of the actual datum (boost::variant can avoid using dynamically allocated memory for small objects --thanks @MSalters).
Pure C approach:
Store type information and a void pointer to memory that has to be interpreted according to the type information (usually an enum):
enum type_t {
integer,
string,
null
};
typedef struct variable {
type_t type;
void * datum;
} variable_t;
void init_int_variable( variable_t * var, int value )
{
var->type = integer;
var->datum = malloc( sizeof(int) );
*((int)var->datum) = value;
}
void fini_variable( variable_t var ) // optionally by pointer
{
free( var.datum );
}
In C++ you can improve this approach by using classes to simplify the usage, but more importantly you can go for more complex solutions and use existing libraries as boost::any or boost::variant that offer different solutions to the same problem.
Both boost::any and boost::variant store the values in dynamically allocated memory, usually through a pointer to a virtual class in a hierarchy, and with operators that reinterpret (down casts) to the concrete types.
One obvious solution is to define a type hierarchy:
class Type
{
};
class Int : public Type
{
};
class String : public Type
{
};
and so on. As a complete example, let us write an interpreter for a tiny language. The language allows declaring variables like this:
var a 10
That will create an Int
object, assign it the value 10
and store it in a variable's table under the name a
. Operations can be invoked on variables. For instance the addition operation on two Int values looks like:
+ a b
Here is the complete code for the interpreter:
#include <iostream>
#include <string>
#include <vector>
#include <sstream>
#include <cstdlib>
#include <map>
// The base Type object from which all data types are derived.
class Type
{
public:
typedef std::vector<Type*> TypeVector;
virtual ~Type () { }
// Some functions that you may want all types of objects to support:
// Returns the string representation of the object.
virtual const std::string toString () const = 0;
// Returns true if other_obj is the same as this.
virtual bool equals (const Type &other_obj) = 0;
// Invokes an operation on this object with the objects in args
// as arguments.
virtual Type* invoke (const std::string &opr, const TypeVector &args) = 0;
};
// An implementation of Type to represent an integer. The C++ int is
// used to actually store the value. As a consequence this type is
// machine dependent, which might not be what you want for a real
// high-level language.
class Int : public Type
{
public:
Int () : value_ (0), ret_ (NULL) { }
Int (int v) : value_ (v), ret_ (NULL) { }
Int (const std::string &v) : value_ (atoi (v.c_str ())), ret_ (NULL) { }
virtual ~Int ()
{
delete ret_;
}
virtual const std::string toString () const
{
std::ostringstream out;
out << value_;
return out.str ();
}
virtual bool equals (const Type &other_obj)
{
if (&other_obj == this)
return true;
try
{
const Int &i = dynamic_cast<const Int&> (other_obj);
return value_ == i.value_;
}
catch (std::bad_cast ex)
{
return false;
}
}
// As of now, Int supports only addition, represented by '+'.
virtual Type* invoke (const std::string &opr, const TypeVector &args)
{
if (opr == "+")
{
return add (args);
}
return NULL;
}
private:
Type* add (const TypeVector &args)
{
if (ret_ == NULL) ret_ = new Int;
Int *i = dynamic_cast<Int*> (ret_);
Int *arg = dynamic_cast<Int*> (args[0]);
i->value_ = value_ + arg->value_;
return ret_;
}
int value_;
Type *ret_;
};
// We use std::map as a symbol (or variable) table.
typedef std::map<std::string, Type*> VarsTable;
typedef std::vector<std::string> Tokens;
// A simple tokenizer for our language. Takes a line and
// tokenizes it based on whitespaces.
static void
tokenize (const std::string &line, Tokens &tokens)
{
std::istringstream in (line, std::istringstream::in);
while (!in.eof ())
{
std::string token;
in >> token;
tokens.push_back (token);
}
}
// Maps varName to an Int object in the symbol table. To support
// other Types, we need a more complex interpreter that actually infers
// the type of object by looking at the format of value.
static void
setVar (const std::string &varName, const std::string &value,
VarsTable &vars)
{
Type *t = new Int (value);
vars[varName] = t;
}
// Returns a previously mapped value from the symbol table.
static Type *
getVar (const std::string &varName, const VarsTable &vars)
{
VarsTable::const_iterator iter = vars.find (varName);
if (iter == vars.end ())
{
std::cout << "Variable " << varName
<< " not found." << std::endl;
return NULL;
}
return const_cast<Type*> (iter->second);
}
// Invokes opr on the object mapped to the name var01.
// opr should represent a binary operation. var02 will
// be pushed to the args vector. The string represenation of
// the result is printed to the console.
static void
invoke (const std::string &opr, const std::string &var01,
const std::string &var02, const VarsTable &vars)
{
Type::TypeVector args;
Type *arg01 = getVar (var01, vars);
if (arg01 == NULL) return;
Type *arg02 = getVar (var02, vars);
if (arg02 == NULL) return;
args.push_back (arg02);
Type *ret = NULL;
if ((ret = arg01->invoke (opr, args)) != NULL)
std::cout << "=> " << ret->toString () << std::endl;
else
std::cout << "Failed to invoke " << opr << " on "
<< var01 << std::endl;
}
// A simple REPL for our language. Type 'quit' to exit
// the loop.
int
main (int argc, char **argv)
{
VarsTable vars;
std::string line;
while (std::getline (std::cin, line))
{
if (line == "quit")
break;
else
{
Tokens tokens;
tokenize (line, tokens);
if (tokens.size () != 3)
{
std::cout << "Invalid expression." << std::endl;
continue;
}
if (tokens[0] == "var")
setVar (tokens[1], tokens[2], vars);
else
invoke (tokens[0], tokens[1], tokens[2], vars);
}
}
return 0;
}
A sample interaction with the interpreter:
/home/me $ ./mylang
var a 10
var b 20
+ a b
30
+ a c
Variable c not found.
quit
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With