I've encountered (what I think is) a strange behavior when using the sax parser, and I wanted to know if it's normal.
I'm sending this XML through the SAX parser:
<site url="http://example.com/?a=b&b=c"; />
The "&" gets converted to " &" when the startElement
callback
is called. Is it supposed to do that? If so, I would like to
understand why.
I've pasted an example demonstrating the issue here:
#include <stdlib.h>
#include <libxml/parser.h>
static void start_element(void * ctx, const xmlChar *name, const xmlChar **atts)
{
int i = 0;
while(atts[i] != NULL) {
printf("%s\n", atts[i]);
i++;
}
}
int main(int argc, char *argv[]) {
xmlSAXHandlerPtr handler = calloc(1, sizeof(xmlSAXHandler));
handler->startElement = start_element;
char * xml = "<site url=\"http://example.com/?a=b&b=c\" />";
xmlSAXUserParseMemory( handler,
NULL,
xml,
strlen(xml)
);
}
PS: This message is actually extracted from the LibXML2 list... and I am not the initial author of this mail, but I noticed the problem using Nokogiri and Aaron (the maintainer of Nokogiri) actually posted this message himself.
This message describes the same problem (which I had as well) and the response says to
ask the parser to replace entities values
What that means is when you are setting up your context, set the option like this:
xmlParserCtxtPtr context = xmlCreatePushParserCtxt(&yourSAXHandlerStruct, self, NULL, 0, NULL);
xmlCtxtUseOptions(context, XML_PARSE_NOENT);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With