Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing code files faster

I wrote a fairly complex parser for a stack-based language which loads a file into memory and then proceeds by comparing tokens to see if it is recognized as operand or instruction.

Every time I have to parse a new operand/instruction I std::copy the memory from the file buffer to a std::string and then do a `

if(parsed_string.compare("add") == 0) { /* handle multiplication */} 
else if(parsed_string.compare("sub") == 0) { /* handle subtraction */ } 
else { /* This is an operand */ }

unfortunately all these copies are making the parsing slow.

How should I handle this to avoid all these copies? I always thought I didn't need a tokenizer since the language itself and the logic is pretty simple.

Edit: I'm adding the code where I get the copies for the various operands and instructions

  // This function accounts for 70% of the total time of the program
  std::string Parser::read_as_string(size_t start, size_t end) {

    std::vector<char> file_memory(end - start);
    read_range(start, end - start, file_memory);
    std::string result(file_memory.data(), file_memory.size());
    return std::move(result); // Intended to be consumed
  }

  void Parser::read_range(size_t start, size_t size, std::string& destination) {

    if (destination.size() < size)
      destination.resize(size); // Allocate necessary space

    std::copy(file_in_memory.begin() + start,
      file_in_memory.begin() + start + size,
      destination.begin());
  }
like image 900
Dean Avatar asked Dec 04 '15 17:12

Dean


1 Answers

This copying is not necessary. You can operate on slices.

struct StrSlice {
  StrSlice(const std::string& embracingStr, std::size_t startIx, std::size_t length)
  : begin_(/* todo */), end_(/* todo */) // Assign begin_ and end_ here 
  {}

  StrSlice(const char* begin, const char* end)
  : begin_(begin), end_(end) 
  {}
  // Define some more constructors
  // Be careful about implicit conversions
  //...

  //Define lots of comparasion routines with other strings here
  bool operator==(const char* str) const {
    ... 
  }

  bool operator==(const StrSlice& str) const {
    ... 
  } 

  // You can take slice of a slice in O(1) time
  StrSlice subslice(std::size_t startIx, std::size_t length) {
    assert(/* do some range checks here */);
    const char* subsliceBegin = begin_ + startIx;
    const char* subsliceEnd = subsliceBegin + length;
    return StrSlice(subsliceBegin, subsliceEnd); 
  }
private:
  const char* begin_;
  const char* end_;
}; 

I hope you get the idea. Of course, this slice will break after any change in the associated string, expecially memory reallocation. But it seems like your string donesn't change unless you read a new file.

like image 171
Minor Threat Avatar answered Sep 22 '22 19:09

Minor Threat