Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Building a graph of the structure of an XML document

I'd like to build a graph showing which tags are used as children of which other tags in a given XML document.

I've written this function to get the unique set of child tags for a given tag in an lxml.etree tree:

def iter_unique_child_tags(root, tag):
    """Iterates through unique child tags for all instances of tag.

    Iteration starts at `root`.
    """
    found_child_tags = set()
    instances = root.iterdescendants(tag)
    from itertools import chain
    child_nodes = chain.from_iterable(i.getchildren() for i in instances)
    child_tags = (n.tag for n in child_nodes)
    for t in child_tags:
        if t not in found_child_tags:
            found_child_tags.add(t)
            yield t

Is there a general-purpose graph builder that I could use with this function to build a dotfile or a graph in some other format?

I'm also getting the sneaking suspicion that there is a tool somewhere explicitly designed for this purpose; what might that be?

like image 702
intuited Avatar asked Jul 17 '10 20:07

intuited


People also ask

What is the structure of an XML document?

XML documents are formed as element trees. An XML tree starts at a root element and branches from the root to child elements. The terms parent, child, and sibling are used to describe the relationships between elements.

What is XML explain XML document structure with example?

An XML document is a basic unit of XML information composed of elements and other markup in an orderly package. An XML document can contains wide variety of data. For example, database of numbers, numbers representing molecular structure or a mathematical equation.

Is XML a graph?

An XML document is a tree, which is a particular sort of directed graph.

Is XML designed for structured data structure?

XML is for structuring data XML makes it easy for a computer to generate data, read data, and ensure that the data structure is unambiguous. XML avoids common pitfalls in language design: it is extensible, platform-independent, and it supports internationalization and localization. XML is fully Unicode-compliant.


1 Answers

I ended up using python-graph. I also ended up using argparse to build a command line interface that pulls some basic bits of info from XML documents and builds graph images in formats supported by pydot. It's called xmlearn and is sort of useful:

usage: xmlearn [-h] [-i INFILE] [-p PATH] {graph,dump,tags} ...

optional arguments:
  -h, --help            show this help message and exit
  -i INFILE, --infile INFILE
                        The XML file to learn about. Defaults to stdin.
  -p PATH, --path PATH  An XPath to be applied to various actions.
                        Defaults to the root node.

subcommands:
  {graph,dump,tags}
    dump                Dump xml data according to a set of rules.
    tags                Show information about tags.
    graph               Build a graph from the XML tags relationships.
like image 110
intuited Avatar answered Oct 11 '22 10:10

intuited