Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to resolve an internally-declared XML entity reference using NSXMLParser

I have an XML file that uses internally-declared entities. For example:

<?xml version="1.0" encoding="UTF-8"?>

...

<!ENTITY my_symbol "my symbol value">

...

<my_element>
    <my_next_element>foo&my_symbol;bar</my_next_element>
</my_element>

...

Using the NSXMLParser class, how am I able to resolve the my_symbol entity reference?

From experimentation, the parser:foundInternalEntityDeclarationWithName:value: delegate method will be called for the my_symbol entity declaration, with value "my symbol value". Then, when the my_next_element element is reached, NSXMLParser will call the parser:didStartElement:namespaceURI:qualifiedName:attributes: delegate method.

Before parser:didEndElement:namespaceURI:qualifiedName: is called for </my_next_element>, the parser:foundCharacters: delegate method will be called twice with the strings:

  1. "foo"
  2. "bar"

The my_symbol entity reference is ignored. What is required in order for the entity reference to be resolved?

EDIT:

Removing the ENTITY declaration of my_symbol from the DTD will result in an NSXMLParserUndeclaredEntityError. This suggests that when the entity declaration is present, and then referenced in <my_next_element>, it is being noticed. For some reason it's just not being resolved to the string it represents.

Also, if &amp; is used within an element, the parser will correctly resolve it to "&" and this is passed as the string when the parser:foundCharacters: delegate method is called.

like image 943
Ben Lever Avatar asked Oct 09 '09 00:10

Ben Lever


People also ask

What is the correct way to declare an XML entity?

An entity declaration is created by using the <! ENTITY name "value"> syntax in a document type definition (DTD) or XML schema. Secondly, the name defined in the entity declaration is subsequently used in the XML. When used in the XML, it is called an entity reference.

What is an XML entity reference?

An entity reference is a group of characters used in text as a substitute for a single specific character that is also a markup delimiter in XML.

What are internal entity in XML?

An internal entity is one that is defined locally within a DTD. The basic purpose of an internal entity is to get rid of typing the same content (like the name of the organization) again and again.


1 Answers

I reviewed NSXMLParser.h which lists the following defined methods for delegates to support:

@interface NSObject (NSXMLParserDelegateEventAdditions)
// Document handling methods
- (void)parserDidStartDocument:(NSXMLParser *)parser;
    // sent when the parser begins parsing of the document.
- (void)parserDidEndDocument:(NSXMLParser *)parser;
    // sent when the parser has completed parsing. If this is encountered, the parse was successful.

// DTD handling methods for various declarations.
- (void)parser:(NSXMLParser *)parser foundNotationDeclarationWithName:(NSString *)name publicID:(NSString *)publicID systemID:(NSString *)systemID;

- (void)parser:(NSXMLParser *)parser foundUnparsedEntityDeclarationWithName:(NSString *)name publicID:(NSString *)publicID systemID:(NSString *)systemID notationName:(NSString *)notationName;

- (void)parser:(NSXMLParser *)parser foundAttributeDeclarationWithName:(NSString *)attributeName forElement:(NSString *)elementName type:(NSString *)type defaultValue:(NSString *)defaultValue;

- (void)parser:(NSXMLParser *)parser foundElementDeclarationWithName:(NSString *)elementName model:(NSString *)model;

- (void)parser:(NSXMLParser *)parser foundInternalEntityDeclarationWithName:(NSString *)name value:(NSString *)value;

- (void)parser:(NSXMLParser *)parser foundExternalEntityDeclarationWithName:(NSString *)name publicID:(NSString *)publicID systemID:(NSString *)systemID;

- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict;
    // sent when the parser finds an element start tag.
    // In the case of the cvslog tag, the following is what the delegate receives:
    //   elementName == cvslog, namespaceURI == http://xml.apple.com/cvslog, qualifiedName == cvslog
    // In the case of the radar tag, the following is what's passed in:
    //    elementName == radar, namespaceURI == http://xml.apple.com/radar, qualifiedName == radar:radar
    // If namespace processing >isn't< on, the xmlns:radar="http://xml.apple.com/radar" is returned as an attribute pair, the elementName is 'radar:radar' and there is no qualifiedName.

- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName;
    // sent when an end tag is encountered. The various parameters are supplied as above.

- (void)parser:(NSXMLParser *)parser didStartMappingPrefix:(NSString *)prefix toURI:(NSString *)namespaceURI;
    // sent when the parser first sees a namespace attribute.
    // In the case of the cvslog tag, before the didStartElement:, you'd get one of these with prefix == @"" and namespaceURI == @"http://xml.apple.com/cvslog" (i.e. the default namespace)
    // In the case of the radar:radar tag, before the didStartElement: you'd get one of these with prefix == @"radar" and namespaceURI == @"http://xml.apple.com/radar"

- (void)parser:(NSXMLParser *)parser didEndMappingPrefix:(NSString *)prefix;
    // sent when the namespace prefix in question goes out of scope.

- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string;
    // This returns the string of the characters encountered thus far. You may not necessarily get the longest character run. The parser reserves the right to hand these to the delegate as potentially many calls in a row to -parser:foundCharacters:

- (void)parser:(NSXMLParser *)parser foundIgnorableWhitespace:(NSString *)whitespaceString;
    // The parser reports ignorable whitespace in the same way as characters it's found.

- (void)parser:(NSXMLParser *)parser foundProcessingInstructionWithTarget:(NSString *)target data:(NSString *)data;
    // The parser reports a processing instruction to you using this method. In the case above, target == @"xml-stylesheet" and data == @"type='text/css' href='cvslog.css'"

- (void)parser:(NSXMLParser *)parser foundComment:(NSString *)comment;
    // A comment (Text in a <!-- --> block) is reported to the delegate as a single string

- (void)parser:(NSXMLParser *)parser foundCDATA:(NSData *)CDATABlock;
    // this reports a CDATA block to the delegate as an NSData.

- (NSData *)parser:(NSXMLParser *)parser resolveExternalEntityName:(NSString *)name systemID:(NSString *)systemID;
    // this gives the delegate an opportunity to resolve an external entity itself and reply with the resulting data.

- (void)parser:(NSXMLParser *)parser parseErrorOccurred:(NSError *)parseError;
    // ...and this reports a fatal error to the delegate. The parser will stop parsing.

- (void)parser:(NSXMLParser *)parser validationErrorOccurred:(NSError *)validationError;
    // If validation is on, this will report a fatal validation error to the delegate. The parser will stop parsing.
@end

Based on the order of entries in the file it looks the found declaration methods are expected to occur before the elements are found (as you've discovered). I'd try handling all of these methods and see if any of them occur, but they all look like they are designed for other uses.

I wonder if there is a way to instrument all the unhandled messages sent to your delegate just in case the documentation/interface is incomplete.

like image 173
Epsilon Prime Avatar answered Oct 08 '22 20:10

Epsilon Prime