I have an XML file that encodes a directed acyclic graph (DAG) that represents a partial order. Such graphs are useful for things like specifying dependencies and finding critical paths. For the curious, my current application is to specify component dependencies for a build system, so vertices are components and edges specify compile-time dependencies. Here is a simple example:
<?xml version="1.0"?>
<dag>
<vertex name="A">
<directed-edge-to vertex="C"/>
</vertex>
<vertex name="B">
<directed-edge-to vertex="C"/>
<directed-edge-to vertex="D"/>
</vertex>
<vertex name="C">
<directed-edge-to vertex="E"/>
</vertex>
<vertex name="D">
<directed-edge-to vertex="E"/>
</vertex>
<vertex name="E">
<directed-edge-to vertex="G"/>
</vertex>
<vertex name="F">
<directed-edge-to vertex="G"/>
</vertex>
<vertex name="G"/>
</dag>
This DAG may be drawn like this:
(source: iparelan.com)
I'd like to apply an XSLT stylesheet that produces another XML
document that contains only the vertices that correspond to minimal elements of the partial order. That is, those vertices that have no incoming edges. The set of minimal vertices for the example graph is {A, B, F}
. For my build dependency application, finding this set is valuable because I know that if I build the members of this set, then everything in my project will be built.
Here is my current stylesheet solution (I'm running this with Xalan on Java using Apache Ant's xslt
task). A key observation is that a minimal vertex will not be referenced in any directed-edge-to
element:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xalan="http://xml.apache.org/xslt"
exclude-result-prefixes="xalan">
<xsl:output method="xml" indent="yes" xalan:indent-amount="4"/>
<xsl:template match="dag">
<minimal-vertices>
<xsl:for-each select="//vertex">
<xsl:if test="not(//vertex/directed-edge-to[@vertex=current()/@name])">
<minimal-vertex name="{@name}"/>
</xsl:if>
</xsl:for-each>
</minimal-vertices>
</xsl:template>
</xsl:stylesheet>
Applying this stylesheet produces the following output (which I believe is correct):
<?xml version="1.0" encoding="UTF-8"?>
<minimal-vertices>
<minimal-vertex name="A"/>
<minimal-vertex name="B"/>
<minimal-vertex name="F"/>
</minimal-vertices>
The thing is, I'm not completely satisfied with this solution. I'm wondering if there is a way to combine the select
of the for-each
and the test
of the if
with XPath syntax.
I want to write something like:
<xsl:for-each select="//vertex[not(//vertex/directed-edge-to[@vertex=current()/@name])]">
But that does not do what I want because the current()
function does not reference the nodes selected by the outer //vertex
expression.
Thusfar, my solution uses XPath 1.0 and XSLT 1.0 syntax, though I'm open to XPath 2.0 and XSLT 2.0 syntax as well.
Here's the Ant build script if you like:
<?xml version="1.0"?>
<project name="minimal-dag" default="default">
<target name="default">
<xslt in="dag.xml" out="minimal-vertices.xml" style="find-minimal-vertices.xsl"/>
</target>
<target name="dot">
<xslt in="dag.xml" out="dag.dot" style="xml-to-dot.xsl"/>
</target>
</project>
The dot
target generates Graphviz Dot language code for rendering the graph. Here is xml-to-dot.xsl
:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xalan="http://xml.apache.org/xslt"
exclude-result-prefixes="xalan">
<xsl:output method="text"/>
<xsl:template match="dag">
digraph {
rankdir="BT";
node [style="filled", fillcolor="cyan", fontname="Helvetica"];
<xsl:apply-templates select="//directed-edge-to"/>
}
</xsl:template>
<xsl:template match="directed-edge-to">
<xsl:value-of select="concat(ancestor::vertex/@name, '->', @vertex, ';')"/>
</xsl:template>
</xsl:stylesheet>
You can take advantage of XPath's implicit existential quantification on the =
operator:
<xsl:for-each select="//vertex[not(@name = //vertex/directed-edge-to/@vertex)]">
When you use any of the six comparison operators (=
, !=
, <
, <=
, >
, and >=
) to compare a node-set, the expression will return true if any node in the node-set satisfies the condition. When comparing one node-set with another, the expression returns true if any node in the first node-set satisfies the condition when compared with any node in the second node-set. XPath 2.0 introduces six new operators that don't perform this existential quantification (eq
, ne
, lt
, le
, gt
, and ge
). But in your case, you'll want to use "=
" to get that existential quantification.
Note of course, that you'll still want to use the not()
function as you were doing. Most of the time, it's good to avoid the !=
operator. If you used it here instead of not()
, then it would return true if there are any @vertex
attributes that are not equal to the @name
value, which is not your intention. (And if either node-set is empty, then it would return false, as comparisons with empty node-sets always return false.)
If you want to use eq
instead, then you'd have to do something like you did: separate out the conditional from the iteration so you could bind current()
. But in XPath 2.0, you can do this within an expression:
<xsl:for-each select="for $v in //vertex
return $v[not(//directed-edge-to[@vertex eq $v/@name])]">
This is useful for when your condition isn't a simple equality comparison (and thus can't be existentially quantified using "=
"). For example: starts-with(@vertex, $v/@name)
.
XPath 2.0 also has an explicit way of performing existential quantification. Instead of the for
expression above, we could have written this:
<xsl:for-each select="//vertex[not(some $e in //directed-edge-to
satisfies @name eq $e/@vertex)]">
In addition to the "some
" syntax, XPath 2.0 also supplies a corresponding "every
" syntax for performing universal quantification.
Rather than using for-each
, you could also use template rules, which are more modular (and powerful):
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<minimal-vertices>
<xsl:apply-templates/>
</minimal-vertices>
</xsl:template>
<!-- Copy vertex elements that have no arrows pointing to them -->
<xsl:template match="vertex[not(@name = //directed-edge-to/@vertex)]">
<minimal-vertex name="{@name}"/>
</xsl:template>
</xsl:stylesheet>
Again, in this case, we're relying on the existential quantification of =
.
XSLT 1.0 prohibits use of the current()
function in patterns, i.e., in the match
attribute, but XSLT 2.0 allows it. In that case, current()
refers to the node currently being matched. So in XSLT 2.0, we could also write this (without having to use a for
expression):
<xsl:template match="vertex[not(//directed-edge-to[@vertex eq current()/@name])]">
Note that this pattern is essentially the same as the expression you tried to use in for-each
, but whereas it doesn't do what you want in for-each
, it does do what you want in the pattern (because what current()
binds to is different).
Finally, I'll add one more variation that in some ways simplifies the logic (removing not()
). This also goes back to using XSLT 1.0:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<minimal-vertices>
<xsl:apply-templates/>
</minimal-vertices>
</xsl:template>
<!-- By default, copy vertex elements -->
<xsl:template match="vertex">
<minimal-vertex name="{@name}"/>
</xsl:template>
<!-- But strip out vertices with incoming arrows -->
<xsl:template match="vertex[@name = //directed-edge-to/@vertex]"/>
</xsl:stylesheet>
If you don't like the whitespace being output, add an empty rule for text nodes, so they'll get stripped out (overriding the default rule for text nodes, which is to copy them):
<xsl:template match="text()"/>
Or you could just be more selective in what nodes you apply templates to:
<xsl:apply-templates select="/dag/vertex"/>
Which approach you take is partially dependent on taste, partially dependent on the wider context of your stylesheet and expected data (how much the input structure might vary, etc.).
I know I went way beyond what you were asking for, but I hope you at least found this interesting. :-)
One such XPath 1.0 expression is:
/*/vertex[not(@name = /*/vertex/directed-edge-to/@vertex)]
Then just put it into an XSLT stylesheet like that:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<minimal-vertices>
<xsl:for-each select=
"/*/vertex[not(@name = /*/vertex/directed-edge-to/@vertex)]"
>
<minimal-vertex name="{@name}"/>
</xsl:for-each>
</minimal-vertices>
</xsl:template>
</xsl:stylesheet>
When this stylesheet is applied on the originally-provided XML document:
<dag>
<vertex name="A">
<directed-edge-to vertex="C"/>
</vertex>
<vertex name="B">
<directed-edge-to vertex="C"/>
<directed-edge-to vertex="D"/>
</vertex>
<vertex name="C">
<directed-edge-to vertex="E"/>
</vertex>
<vertex name="D">
<directed-edge-to vertex="E"/>
</vertex>
<vertex name="E">
<directed-edge-to vertex="G"/>
</vertex>
<vertex name="F">
<directed-edge-to vertex="G"/>
</vertex>
<vertex name="G"/>
</dag>
The wanted result is produced:
<minimal-vertices>
<minimal-vertex name="A" />
<minimal-vertex name="B" />
<minimal-vertex name="F" />
</minimal-vertices>
Do note: A solution for traversing full (maybe cyclic) graphs is available in XSLT here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With