I can't believe this question hasn't been asked before. I have a string that needs to be inserted into an HTML file but it may contain special HTML characters. I want to replace these with the appropriate HTML representation.
The code below works but is pretty verbose and ugly. Performance is not critical for my application but I guess there are scalability problems here also. How can I improve this? I guess this is a job for STL algorithms or some esoteric Boost function, but the code below is the best I can come up with myself.
void escape(std::string *data) { std::string::size_type pos = 0; for (;;) { pos = data->find_first_of("\"&<>", pos); if (pos == std::string::npos) break; std::string replacement; switch ((*data)[pos]) { case '\"': replacement = """; break; case '&': replacement = "&"; break; case '<': replacement = "<"; break; case '>': replacement = ">"; break; default: ; } data->replace(pos, 1, replacement); pos += replacement.size(); }; }
XML escape characters There are only five: " " ' ' < < > > & & Escaping characters depends on where the special character is used. The examples can be validated at the W3C Markup Validation Service.
Instead of just replacing in the original string, you can do copying with on-the-fly replacement which avoids having to move characters in the string. This will have much better complexity and cache behavior, so I'd expect a huge improvement. Or you can use boost::spirit::xml encode or http://code.google.com/p/pugixml/.
void encode(std::string& data) { std::string buffer; buffer.reserve(data.size()); for(size_t pos = 0; pos != data.size(); ++pos) { switch(data[pos]) { case '&': buffer.append("&"); break; case '\"': buffer.append("""); break; case '\'': buffer.append("'"); break; case '<': buffer.append("<"); break; case '>': buffer.append(">"); break; default: buffer.append(&data[pos], 1); break; } } data.swap(buffer); }
EDIT: A small improvement can be achieved by using an heuristic to determine the size of the buffer. Replace the buffer.reserve
line with data.size()*1.1
(10%) or something similar depending of how much replacements are expected.
void escape(std::string *data) { using boost::algorithm::replace_all; replace_all(*data, "&", "&"); replace_all(*data, "\"", """); replace_all(*data, "\'", "'"); replace_all(*data, "<", "<"); replace_all(*data, ">", ">"); }
Could win the prize for least verbose?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With