Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most efficient way to escape XML/HTML in C++ string?

I can't believe this question hasn't been asked before. I have a string that needs to be inserted into an HTML file but it may contain special HTML characters. I want to replace these with the appropriate HTML representation.

The code below works but is pretty verbose and ugly. Performance is not critical for my application but I guess there are scalability problems here also. How can I improve this? I guess this is a job for STL algorithms or some esoteric Boost function, but the code below is the best I can come up with myself.

void escape(std::string *data) {     std::string::size_type pos = 0;     for (;;)     {         pos = data->find_first_of("\"&<>", pos);         if (pos == std::string::npos) break;         std::string replacement;         switch ((*data)[pos])         {         case '\"': replacement = "&quot;"; break;            case '&':  replacement = "&amp;";  break;            case '<':  replacement = "&lt;";   break;            case '>':  replacement = "&gt;";   break;            default: ;         }         data->replace(pos, 1, replacement);         pos += replacement.size();     }; } 
like image 654
paperjam Avatar asked Apr 14 '11 15:04

paperjam


People also ask

How do you escape XML in HTML?

XML escape characters There are only five: " &quot; ' &apos; < &lt; > &gt; & &amp; Escaping characters depends on where the special character is used. The examples can be validated at the W3C Markup Validation Service.


2 Answers

Instead of just replacing in the original string, you can do copying with on-the-fly replacement which avoids having to move characters in the string. This will have much better complexity and cache behavior, so I'd expect a huge improvement. Or you can use boost::spirit::xml encode or http://code.google.com/p/pugixml/.

void encode(std::string& data) {     std::string buffer;     buffer.reserve(data.size());     for(size_t pos = 0; pos != data.size(); ++pos) {         switch(data[pos]) {             case '&':  buffer.append("&amp;");       break;             case '\"': buffer.append("&quot;");      break;             case '\'': buffer.append("&apos;");      break;             case '<':  buffer.append("&lt;");        break;             case '>':  buffer.append("&gt;");        break;             default:   buffer.append(&data[pos], 1); break;         }     }     data.swap(buffer); } 

EDIT: A small improvement can be achieved by using an heuristic to determine the size of the buffer. Replace the buffer.reserve line with data.size()*1.1 (10%) or something similar depending of how much replacements are expected.

like image 74
Giovanni Funchal Avatar answered Sep 28 '22 10:09

Giovanni Funchal


void escape(std::string *data) {     using boost::algorithm::replace_all;     replace_all(*data, "&",  "&amp;");     replace_all(*data, "\"", "&quot;");     replace_all(*data, "\'", "&apos;");     replace_all(*data, "<",  "&lt;");     replace_all(*data, ">",  "&gt;"); } 

Could win the prize for least verbose?

like image 38
paperjam Avatar answered Sep 28 '22 10:09

paperjam