I've built an interpreter in C++ for a language created by me.
One main problem in the design was that I had two different types in the language: number and string. So I have to pass around a struct like:
class myInterpreterValue
{
myInterpreterType type;
int intValue;
string strValue;
}
Objects of this class are passed around million times a second during e.g.: a countdown loop in my language.
Profiling pointed out: 85% of the performance is eaten by the allocation function of the string template.
This is pretty clear to me: My interpreter has bad design and doesn't use pointers enough. Yet, I don't have an option: I can't use pointers in most cases as I just have to make copies.
How to do something against this? Is a class like this a better idea?
vector<string> strTable;
vector<int> intTable;
class myInterpreterValue
{
myInterpreterType type;
int locationInTable;
}
So the class only knows what type it represents and the position in the table
This however again has disadvantages: I'd have to add temporary values to the string/int vector table and then remove them again, this would eat a lot of performance again.
I suspect many values aren't strings. So the first thing you can do is to get rid of the string
object if you don't need it. Put it into an union. Another thing is that probably many of your strings are only small, thus you can get rid of heap allocation if you save small strings in the object itself. LLVM has the SmallString
template for that. And then you can use string interning, as another answer says too. LLVM has the StringPool
class for that: Call intern("foo")
and get a smart pointer refering to a shared string potentially used by other myInterpreterValue
objects too.
The union can be written like this
class myInterpreterValue {
boost::variant<int, string> value;
};
boost::variant
does the type tagging for you. You can implement it like this, if you don't have boost. The alignment can't be gotten portably in C++ yet, so we push some types that possibly require some large alignment into the storage union.
class myInterpreterValue {
union Storage {
// for getting alignment
long double ld_;
long long ll_;
// for getting size
int i1;
char s1[sizeof(string)];
// for access
char c;
};
enum type { IntValue, StringValue } m_type;
Storage m_store;
int *getIntP() { return reinterpret_cast<int*>(&m_store.c); }
string *getStringP() { return reinterpret_cast<string*>(&m_store.c); }
public:
myInterpreterValue(string const& str) {
m_type = StringValue;
new (static_cast<void*>(&m_store.c)) string(str);
}
myInterpreterValue(int i) {
m_type = IntValue;
new (static_cast<void*>(&m_store.c)) int(i);
}
~myInterpreterValue() {
if(m_type == StringValue) {
getStringP()->~string(); // call destructor
}
}
string &asString() { return *getStringP(); }
int &asInt() { return *getIntP(); }
};
You get the idea.
I think some dynamic languages cache all equivalent strings at runtime with a hash lookup and only store pointers. In each iteration of the loop where the string is staying the same, therefore, there would be just a pointer assigment or at most a string hashing function. I know some languages (Smalltalk, I think?) do this with not only strings but small numbers. See Flyweight Pattern.
IANAE on this one. If that doesn't help, you should give the loop code and walk us through how it's being interpreted.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With