I have several XML files which have a similar structure but with some differences that I cannot overlook. They are all TEI documents.
I am looking for a way to outline the main structure.
Take the following text as an example:
<text xmlns="http://www.tei-c.org/ns/1.0" xml:id="d1">
<body xml:id="d2">
<div1 type="book" xml:id="d3">
<head>Songs of Innocence</head>
<pb n="4"/>
<div2 type="poem" xml:id="d4">
<head>Introduction</head>
<lg type="stanza">
<l>Piping down the valleys wild, </l>
<l>Piping songs of pleasant glee, </l>
<l>On a cloud I saw a child, </l>
<l>And he laughing said to me: </l>
</lg>
I would like to suppress the nodes of the same type and all the repeating structures:
<body xml:id="d2">
<div1 type="book" xml:id="d3">
<head>Songs of Innocence</head>
<pb n="4"/>
<div2 type="poem" xml:id="d4">
<head>Introduction</head>
<lg type="stanza">
<l>...</l>
</lg>
<lg>...</lg>
So, basically I want to reduce the XML document to its most basic structure. In this way I can figure out how to properly convert them using XSLT.
Here are some options for viewing your XML in a tree structure:
Note, however, that you'll need to clean up your markup. What you show doesn't qualify as XML as it's missing end tags and lacks a single root element. (XML has to be well-formed.)
Using perl XML::DT, (apt-get install libxml-dt-perl
if not installed),
the command mkxmltype file.xml
returns a compact description of the
xml structure. Example
$ mkxmltype -lines=1000 a.xml
# text ...Fri Feb 26 17:56:24 2016
text => body * xml:id
body => div1 * xml:id
div1 => tup(div2, pb, head) * type * xml:id
div2 => tup(head, lg) * type * xml:id
pb => empty * n
head => text
lg => seq(l) * type
l => text
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With