Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ interpreter conceptual problem

I've built an interpreter in C++ for a language created by me.

One main problem in the design was that I had two different types in the language: number and string. So I have to pass around a struct like:

class myInterpreterValue
{
 myInterpreterType type;
 int intValue;
 string strValue;
}

Objects of this class are passed around million times a second during e.g.: a countdown loop in my language.

Profiling pointed out: 85% of the performance is eaten by the allocation function of the string template.

This is pretty clear to me: My interpreter has bad design and doesn't use pointers enough. Yet, I don't have an option: I can't use pointers in most cases as I just have to make copies.

How to do something against this? Is a class like this a better idea?

vector<string> strTable;
vector<int> intTable;
class myInterpreterValue
{
 myInterpreterType type;
 int locationInTable;
}

So the class only knows what type it represents and the position in the table

This however again has disadvantages: I'd have to add temporary values to the string/int vector table and then remove them again, this would eat a lot of performance again.

  • Help, how do interpreters of languages like Python or Ruby do that? They somehow need a struct that represents a value in the language like something that can either be int or string.
like image 727
Jan Wilkins Avatar asked Apr 17 '10 23:04

Jan Wilkins


2 Answers

I suspect many values aren't strings. So the first thing you can do is to get rid of the string object if you don't need it. Put it into an union. Another thing is that probably many of your strings are only small, thus you can get rid of heap allocation if you save small strings in the object itself. LLVM has the SmallString template for that. And then you can use string interning, as another answer says too. LLVM has the StringPool class for that: Call intern("foo") and get a smart pointer refering to a shared string potentially used by other myInterpreterValue objects too.

The union can be written like this

class myInterpreterValue {
 boost::variant<int, string> value;
};

boost::variant does the type tagging for you. You can implement it like this, if you don't have boost. The alignment can't be gotten portably in C++ yet, so we push some types that possibly require some large alignment into the storage union.

class myInterpreterValue {
 union Storage {
   // for getting alignment
   long double ld_;
   long long ll_;

   // for getting size
   int i1;
   char s1[sizeof(string)];

   // for access
   char c;
 };
 enum type { IntValue, StringValue } m_type;

 Storage m_store;
 int *getIntP() { return reinterpret_cast<int*>(&m_store.c); }
 string *getStringP() { return reinterpret_cast<string*>(&m_store.c); }


public:
  myInterpreterValue(string const& str) {
    m_type = StringValue;
    new (static_cast<void*>(&m_store.c)) string(str);
  }

  myInterpreterValue(int i) {
    m_type = IntValue;
    new (static_cast<void*>(&m_store.c)) int(i);
  }
  ~myInterpreterValue() {
    if(m_type == StringValue) {
      getStringP()->~string(); // call destructor
    }
  }
  string &asString() { return *getStringP(); }
  int &asInt() { return *getIntP(); }
};

You get the idea.

like image 116
Johannes Schaub - litb Avatar answered Sep 24 '22 07:09

Johannes Schaub - litb


I think some dynamic languages cache all equivalent strings at runtime with a hash lookup and only store pointers. In each iteration of the loop where the string is staying the same, therefore, there would be just a pointer assigment or at most a string hashing function. I know some languages (Smalltalk, I think?) do this with not only strings but small numbers. See Flyweight Pattern.

IANAE on this one. If that doesn't help, you should give the loop code and walk us through how it's being interpreted.

like image 34
Jesse Millikan Avatar answered Sep 24 '22 07:09

Jesse Millikan