I wrote a fairly complex parser for a stack-based language which loads a file into memory and then proceeds by comparing tokens to see if it is recognized as operand or instruction.
Every time I have to parse a new operand/instruction I std::copy
the memory from the file buffer to a std::string
and then do a `
if(parsed_string.compare("add") == 0) { /* handle multiplication */}
else if(parsed_string.compare("sub") == 0) { /* handle subtraction */ }
else { /* This is an operand */ }
unfortunately all these copies are making the parsing slow.
How should I handle this to avoid all these copies? I always thought I didn't need a tokenizer since the language itself and the logic is pretty simple.
Edit: I'm adding the code where I get the copies for the various operands and instructions
// This function accounts for 70% of the total time of the program
std::string Parser::read_as_string(size_t start, size_t end) {
std::vector<char> file_memory(end - start);
read_range(start, end - start, file_memory);
std::string result(file_memory.data(), file_memory.size());
return std::move(result); // Intended to be consumed
}
void Parser::read_range(size_t start, size_t size, std::string& destination) {
if (destination.size() < size)
destination.resize(size); // Allocate necessary space
std::copy(file_in_memory.begin() + start,
file_in_memory.begin() + start + size,
destination.begin());
}
This copying is not necessary. You can operate on slices.
struct StrSlice {
StrSlice(const std::string& embracingStr, std::size_t startIx, std::size_t length)
: begin_(/* todo */), end_(/* todo */) // Assign begin_ and end_ here
{}
StrSlice(const char* begin, const char* end)
: begin_(begin), end_(end)
{}
// Define some more constructors
// Be careful about implicit conversions
//...
//Define lots of comparasion routines with other strings here
bool operator==(const char* str) const {
...
}
bool operator==(const StrSlice& str) const {
...
}
// You can take slice of a slice in O(1) time
StrSlice subslice(std::size_t startIx, std::size_t length) {
assert(/* do some range checks here */);
const char* subsliceBegin = begin_ + startIx;
const char* subsliceEnd = subsliceBegin + length;
return StrSlice(subsliceBegin, subsliceEnd);
}
private:
const char* begin_;
const char* end_;
};
I hope you get the idea. Of course, this slice will break after any change in the associated string, expecially memory reallocation. But it seems like your string donesn't change unless you read a new file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With