Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is this XML valid, and how to create it with TXMLDocument

Is this XML "valid"?

<?xml version="1.0"?>
<p class="leaders">
    Todd
    <span class="leader-type">.</span>
    R
    <span class="leader-type">.</span>
    Colas
</p>

I've never seen an XML doc with multiple "values" for a node like this does for the <p> node.

How do I parse out the three values for <p> with TXMLDocument? And how to traverse to the <span> nodes?

Finally...how do I create an XML document like this with TXMLDocument????

Help!!!!

like image 244
user1498879 Avatar asked Dec 15 '22 21:12

user1498879


2 Answers

When you say, is it valid, I think you mean: is it well-formed? (We can't tell whether it is valid without a DTD or schema).

Yes, it is well-formed. It is a perfecly normal example of a document containing mixed content, which is what XML is designed for.

I can't answer your questions about TXMLDocument because I've never heard of it: presumably it's part of a delphi XML library.

like image 193
Michael Kay Avatar answered Dec 31 '22 08:12

Michael Kay


Yes, it is valid XML. To parse it, you have to understand that XML is represented as a tree of nodes. That XML would parse into the following tree structure.

p
|_ attributes
| |_ "class"="leaders"
|
|_ children
  |_ #text "Todd"
  |
  |_ span
  | |_ attributes
  | | |_ "class"="leader-type"
  | |
  | |_ children
  |   |_ #text "."
  |
  |_ #text "R"
  |
  |_ span
  | |_ attributes
  | | |_ "class"="leader-type"
  | |
  | |_ children
  |   |_ #text "."
  |
  |_ #text "Colas"

Each attribute and child node is represents as a separate IXMLNode interface in the TXMLDocument. As you can see, the plain text portions are separated into their own #text nodes.

Once you have loaded the XML into TXMLDocument, the TXMLDocument.DocumentElement property represents the <p> node. That node's AttributeNodes property contains a "class" node, and its ChildNodes property contains the first level of #text and <span> nodes. The <span> nodes have their own AttributeNodes and ChildNodes lists, and so on. So to parse this, you would iterate through the tree looking for the #text nodes, using the <span> nodes to manipulate the text as needed.

To create such a document, you simply create the individual nodes as needed, eg:

Doc.Active := False;
Doc.Active := True;

Node := Doc.AddChild('p');
Node.Attributes['class'] := 'leaders';

Child := Doc.CreateNode('Todd', ntText);
Node.ChildNodes.Add(Child);

Child := Node.AddChild('span');
Child.Attributes['class'] := 'leader-type';
Child.Text := '.';

Child := Doc.CreateNode('R', ntText);
Node.ChildNodes.Add(Child);

Child := Node.AddChild('span');
Child.Attributes['class'] := 'leader-type';
Child.Text := '.';

Child := Doc.CreateNode('Colas', ntText);
Node.ChildNodes.Add(Child);

Doc.SaveTo...(...); // generate the XML to your preferred output

If you want whitespace/linebreaks to appear in the XML output, simply include those characters in the content of the #text nodes. When parsing XML into TXMLDocument, unnecessary whitespace is stripped off by default. If you want to preserve it, enable the poPreserveWhiteSpace flag in the ParseOptions property before loading the XML.

like image 40
Remy Lebeau Avatar answered Dec 31 '22 08:12

Remy Lebeau