Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest quote-escaping implementation?

I'm working on some code that is normalizing a lot of data. At the end of processing, a number of key="value" pairs is written out to a file.

The "value" part could be anything, so at the point of output the values must have any embedded quotes escaped as \".

Right now, I'm using the following:

outstream << boost::regex_replace(src, rxquotesearch, quoterepl);
// (where rxquotesearch is  boost::regex("\"")  and quoterepl is "\\\\\"")

However, gprof shows I'm spending most of my execution time in this method, since I have to call it for every value for every line.

I'm curious if there is a faster way than this. I can't use std::replace since I'm replacing one character with two.

Thanks for any advice.

like image 592
Joe Avatar asked Jul 22 '09 01:07

Joe


3 Answers

If speed is a concern you should use a hand-written function to do this. Notice the use of reserve() to try to keep memory (re)allocation to a minimum.

string escape_quotes(const string &before)
{
    string after;
    after.reserve(before.length() + 4);

    for (string::size_type i = 0; i < before.length(); ++i) {
        switch (before[i]) {
            case '"':
            case '\\':
                after += '\\';
                // Fall through.

            default:
                after += before[i];
        }
    }

    return after;
}
like image 186
John Kugelman Avatar answered Nov 10 '22 21:11

John Kugelman


I would not take the source string and build a new output string at all.
I would iterate through the source string and print each character, if the character is a quote then just print a "\" before printing it.

like image 36
KPexEA Avatar answered Nov 10 '22 21:11

KPexEA


I'm not surprised that the regex is really slow here - you're using a big, general-purpose hammer to pound in a tiny little nail. Of course, if you ended up needing to do something more interesting, the regex might quickly gain the advantage in terms of simplicity.

As for a simpler/faster approach, you could try writing the escaped string into a separate buffer one character at a time. Then it becomes trivial to add the escapes, and you don't waste any time reallocating the string or shifting characters. The biggest difficulty will be managing the size of your buffer, but you could just use a vector for that, and reuse the same vector for each string to avoid repeated allocations. The efficiency gain would depend a lot on the details of how vector works, but you can always boil it down to raw arrays and manual memory management if you need to.

The routine might look something like this, if you used vector:

vector<char> buf;
for( some_iterator it = all_the_strings.begin();
     it != all_the_strings.end(); ++it )
{
    buf.clear();
    const string & str = *it;
    for( size_t i = 0; i < str.size(); ++i )
    {
        if( str[i] == '"' || str[i] == '\\' )
            buf.push_back( '\\' );
        buf.push_back( str[i] );
    }
    buf.push_back( '\0' );

    // note: this is not guaranteed to be safe, see answer comments
    const char * escaped = &buf[0];

    // print escaped string to file here...
}
like image 1
Charlie Avatar answered Nov 10 '22 21:11

Charlie