Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do i retrieve an XML entity value in C#?

Tags:

c#

.net

xml

entity

I want to be able to display a list of entity names and values in a C#/.NET 4.0 application.

I am able to retrieve the entity names easily enough using XmlDocument.DocumentType.Entities, but is there a good way to retrieve the values of those entities?

I noticed that I can retrieve the value for text only entities using InnerText, but this doesn't work for entities that contain XML tags.

Is the best way to resort to a regex?

Let's say that I have a document like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document [
  <!ENTITY test "<para>only a test</para>">
  <!ENTITY wwwc "World Wide Web Corporation">
  <!ENTITY copy "&#xA9;">
]>

<document>
  <!-- The following image is the World Wide Web Corporation logo. -->
  <graphics image="logo" alternative="&wwwc; Logo"/>
</document>

I want to present a list to the user containing the three entity names (test, wwwc, and copy), along with their values (the text in quotes following the name). I had not thought through the question of entities nested within other entities, so I would be interested in a solution that either completely expands the entity values or shows the text just as it is in the quotes.

like image 399
Scott Avatar asked Mar 16 '26 21:03

Scott


2 Answers

Although it’s not likely the most elegant solution possible, I came up with something that seems to work well enough for my purposes. First, I parsed the original document and retrieved the entity nodes from that document. Then I created a small in-memory XML document, to which I added all the entity nodes. Next, I added entity references to all of the entities within the temporary XML. Finally, I retrieved the InnerXml from all of the references.

Here's some sample code:

        // parse the original document and retrieve its entities
        XmlDocument parsedXmlDocument = new XmlDocument();
        XmlUrlResolver resolver = new XmlUrlResolver();
        resolver.Credentials = CredentialCache.DefaultCredentials;
        parsedXmlDocument.XmlResolver = resolver;
        parsedXmlDocument.Load(path);

        // create a temporary xml document with all the entities and add references to them
        // the references can then be used to retrieve the value for each entity
        XmlDocument entitiesXmlDocument = new XmlDocument();
        XmlDeclaration dec = entitiesXmlDocument.CreateXmlDeclaration("1.0", null, null);
        entitiesXmlDocument.AppendChild(dec);
        XmlDocumentType newDocType = entitiesXmlDocument.CreateDocumentType(parsedXmlDocument.DocumentType.Name, parsedXmlDocument.DocumentType.PublicId, parsedXmlDocument.DocumentType.SystemId, parsedXmlDocument.DocumentType.InternalSubset);
        entitiesXmlDocument.AppendChild(newDocType);
        XmlElement root = entitiesXmlDocument.CreateElement("xmlEntitiesDoc");
        entitiesXmlDocument.AppendChild(root);
        XmlNamedNodeMap entitiesMap = entitiesXmlDocument.DocumentType.Entities;

        // build a dictionary of entity names and values
        Dictionary<string, string> entitiesDictionary = new Dictionary<string, string>();
        for (int i = 0; i < entitiesMap.Count; i++)
        {
            XmlElement entityElement = entitiesXmlDocument.CreateElement(entitiesMap.Item(i).Name);
            XmlEntityReference entityRefElement = entitiesXmlDocument.CreateEntityReference(entitiesMap.Item(i).Name);
            entityElement.AppendChild(entityRefElement);
            root.AppendChild(entityElement);
            if (!string.IsNullOrEmpty(entityElement.ChildNodes[0].InnerXml))
            {
                // do not add parameter entities or invalid entities
                // this can be determined by checking for an empty string
                entitiesDictionary.Add(entitiesMap.Item(i).Name, entityElement.ChildNodes[0].InnerXml);
            }
        }
like image 112
Scott Avatar answered Mar 19 '26 09:03

Scott


This is one way (untested), it uses XMLReader and the ResolveEntity() method of this class:

private Dictionary<string, string> GetEntities(XmlReader xr)
{
    Dictionary<string, string> entityList = new Dictionary<string, string>();

    while (xr.Read())
    {
        HandleNode(xr, entityList);
    }
    return entityList;
}

StringBuilder sbEntityResolver;
int extElementIndex = 0;
int resolveEntityNestLevel = -1;
string dtdCurrentTopEntity = "";

private void HandleNode(XmlReader inReader, Dictionary<string, string> entityList)
{
    if (inReader.NodeType == XmlNodeType.Element)
    {
        if (resolveEntityNestLevel < 0)
        {
                while (inReader.MoveToNextAttribute())
                {
                    HandleNode(inReader, entityList); // for namespaces
                    while (inReader.ReadAttributeValue())
                    {
                        HandleNode(inReader, entityList); // recursive for resolving entity refs in attributes
                    }                       
                }
        }
        else
        {
            extElementIndex++;
            sbEntityResolver.Append(inReader.ReadOuterXml());
            resolveEntityNestLevel--;
            if (!entityList.ContainsKey(dtdCurrentTopEntity))
            {
                entityList.Add(dtdCurrentTopEntity, sbEntityResolver.ToString());
            }
        }
    }
    else if (inReader.NodeType == XmlNodeType.EntityReference)
    {
        if (inReader.Name[0] != '#' && !entityList.ContainsKey(inReader.Name))
        {
            if (resolveEntityNestLevel < 0)
            {
                sbEntityResolver = new StringBuilder(); // start building entity
                dtdCurrentTopEntity = inReader.Name;
            }
            // entityReference can have contents that contains other
            // entityReferences, so keep track of nest level
            resolveEntityNestLevel++;
            inReader.ResolveEntity();
        }
    }
    else if (inReader.NodeType == XmlNodeType.EndEntity)
    {
        resolveEntityNestLevel--;
        if (resolveEntityNestLevel < 0)
        {
            if (!entityList.ContainsKey(dtdCurrentTopEntity))
            {
                entityList.Add(dtdCurrentTopEntity, sbEntityResolver.ToString());
            }
        }
    }
    else if (inReader.NodeType == XmlNodeType.Text)
    {
        if (resolveEntityNestLevel > -1)
        {
            sbEntityResolver.Append(inReader.Value);
        }
    }
}
like image 39
pgfearo Avatar answered Mar 19 '26 10:03

pgfearo