Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LibXML2 Sax Parsing and ampersand

I've encountered (what I think is) a strange behavior when using the sax parser, and I wanted to know if it's normal.

I'm sending this XML through the SAX parser:

<site url="http://example.com/?a=b&amp;b=c"; />

The "&" gets converted to " &" when the startElement callback is called. Is it supposed to do that? If so, I would like to understand why.

I've pasted an example demonstrating the issue here:

#include <stdlib.h>
#include <libxml/parser.h>

static void start_element(void * ctx, const xmlChar *name, const xmlChar **atts)
{
  int i = 0;
  while(atts[i] != NULL) {
    printf("%s\n", atts[i]);
    i++;
  }
}

int main(int argc, char *argv[]) {
  xmlSAXHandlerPtr handler = calloc(1, sizeof(xmlSAXHandler));
  handler->startElement = start_element;

  char * xml = "<site url=\"http://example.com/?a=b&amp;b=c\" />";

  xmlSAXUserParseMemory( handler,
                          NULL,
                          xml,
                          strlen(xml)
  );
}

PS: This message is actually extracted from the LibXML2 list... and I am not the initial author of this mail, but I noticed the problem using Nokogiri and Aaron (the maintainer of Nokogiri) actually posted this message himself.

like image 593
Julien Genestoux Avatar asked Jun 11 '09 18:06

Julien Genestoux


1 Answers

This message describes the same problem (which I had as well) and the response says to

ask the parser to replace entities values

What that means is when you are setting up your context, set the option like this:

xmlParserCtxtPtr context = xmlCreatePushParserCtxt(&yourSAXHandlerStruct, self, NULL, 0, NULL);
xmlCtxtUseOptions(context, XML_PARSE_NOENT);
like image 120
Don Avatar answered Nov 04 '22 02:11

Don