Given the following XML-compliant HTML:
<div>
<a>a1</a>
<b>b1</b>
</div>
<div>
<b>b2</b>
</div>
<div>
<a>a3</a>
<b>b3</b>
<c>c3</c>
</div>
doing //a
will return:
[a1,a3]
The problem with above is that the third column data is now in second place, when A is not found it is completely skipped.
how can you express an xpath to get all A elements which will return:
[a1, null, a3]
same case for //c
, I wonder if it's possible to get
[null, null, c3]
UPDATE: consider another scenario where are no common parents <div>
.
<h1>heading1</h1>
<a>a1</a>
<b>b1</b>
<h1>heading2</h1>
<b>b2</b>
<h1>heading3</h1>
<a>a3</a>
<b>b3</b>
<c>c3</c>
UPDATE: I am now able to use XSLT as well.
There is no null value in XPath. There's a semi-related question here which also explains this: http://www.velocityreviews.com/forums/t686805-xpath-query-to-return-null-values.html
Realistically, you've got three options:
//a | //div[not(a)]
, which would return the div
element if there was no a
within it, and have your Java code handle any div
's returned as 'no a
element present'. Depending on the context, this may even allow you to output something more useful if required, as you'll have access to the entire contents of the div, for example an error 'no a
element found in div (some identifier)'.a
elements in any div
element that does not already have one with a suitable default.Your second case is a little tricky, and to be honest, I'd actually recommend not using XPath for it at all, but it can be done:
//a | //h1[not(following-sibling::a) or generate-id(.) != generate-id(following-sibling::a[1]/preceding-sibling::h1[1])]
This will match any a
elements, or any h1
elements where no following a
element exists before the next h1
element, or the end of the document. As Dimitre pointed out though, this only works if you're using it from within XSLT, as generate-id
is an XSLT function.
If you're not using it from within XLST, you can use this rather contrived formula:
//a | //h1[not(following-sibling::a) or count(. | preceding-sibling::h1) != count(following-sibling::a[1]/preceding-sibling::h1)]
It works by matching h1
elements where the count of itself and all preceding h1
elements is not the same as the count of all h1
elements preceding the next a
. There may be a more efficient way of doing it in XPath, but if it's going to get any more contrived than that, I'd definitely recommend not using XPath at all.
Solution for the first problem:
This XPath expression:
/*/div/a
|
/*/div[not(a)]
When evaluated against the following XML document:
<t>
<div>
<a>a1</a>
<b>b1</b>
</div>
<div>
<b>b2</b>
</div>
<div>
<a>a3</a>
<b>b3</b>
<c>c3</c>
</div>
</t>
selects the following three nodes (a
, div
, a
):
<a>a1</a>
<div>
<b>b2</b>
</div>
<a>a3</a>
In your java array any selected non-a
element should be treated as (or replaced by) null
.
Here is one solution to the second problem:
Use these XPath expressions for selecting the a
elements from each group:
For the first group:
/*/h1[1]
/following-sibling::a
[not(/*/h1[2])
or
count(.|/*/h1[2]/preceding-sibling::a)
=
count(/*/h1[2]/preceding-sibling::a)
]
For the second group:
/*/h1[2]
/following-sibling::a
[not(/*/h1[3])
or
count(.|/*/h1[3]/preceding-sibling::a)
=
count(/*/h1[3]/preceding-sibling::a)
]
And for the 3rd group:
/*/h1[3]
/following-sibling::a
[not(/*/h1[4])
or
count(.|/*/h1[4]/preceding-sibling::a)
=
count(/*/h1[4]/preceding-sibling::a)
]
In case that:
count(/*/h1
)
is $cnt
,
generate $cnt
such expressions (for i = 1 to $cnt
) and evaluate all of them. The selected nodes by each of them either contains an a
element, or not. If the $k
-th group (nodes selected from evaluating the $k-th expression) contains an a
, use its string value to generate the $k
-th item of the wanted array -- otherwise generate null
for the $k
-th item of the wanted array.
Here is an XSLT - based verification of the above XPath expressions:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:variable name="vGroup1" select=
"/*/h1[1]
/following-sibling::a
[not(/*/h1[2])
or
count(.|/*/h1[2]/preceding-sibling::a)
=
count(/*/h1[2]/preceding-sibling::a)
]
"/>
<xsl:variable name="vGroup2" select=
"/*/h1[2]
/following-sibling::a
[not(/*/h1[3])
or
count(.|/*/h1[3]/preceding-sibling::a)
=
count(/*/h1[3]/preceding-sibling::a)
]
"/>
<xsl:variable name="vGroup3" select=
"/*/h1[3]
/following-sibling::a
[not(/*/h1[4])
or
count(.|/*/h1[4]/preceding-sibling::a)
=
count(/*/h1[4]/preceding-sibling::a)
]
"/>
Group1: "<xsl:copy-of select="$vGroup1"/>"
Group2: "<xsl:copy-of select="$vGroup2"/>"
Group3: "<xsl:copy-of select="$vGroup3"/>"
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the following XML document (no complete and well-formed XML document has been provided by the OP !!!):
<t>
<h1>heading1</h1>
<a>a1</a>
<b>b1</b>
<h1>heading2</h1>
<b>b2</b>
<h1>heading3</h1>
<a>a3</a>
<b>b3</b>
<c>c3</c>
</t>
the three XPath expressions are evaluated and the selected nodes by each of them are output:
Group1: "<a>a1</a>"
Group2: ""
Group3: "<a>a3</a>"
Explanation:
We use the well-known Kayessian formula for the intersection of two nodesets:
$ns1[count(. | $ns2) = count($ns2)]
The result of evaluating this expression contains exactly the nodes that belong both to the nodeset $ns1
and the nodeset $ns2
.
What remains is to substitute $ns1
and $ns2
with expressions that are relevant to the problem.
We substitute $ns1
by:
/*/h1[1]
/following-sibling::a
and we substitute $ns2
by:
/*/h1[2]
/preceding-sibling::a
In other words, the a
elements that are between the first and second /*/h1
are the intersection of the a
elements that are following siblings of /*/h1[1]
and the a
elements that are preceding siblings of /*/h1[2]
.
This expression is only problematic for the a
elements that follow the last of the /*/h1
elements. this is why we add an additional predicate, that checks for non-existence of a next /*/h1
element and or
this with the following boolean expressions.
Finally, as a guiding example for a Java implementation here is a complete XSLT transformation, which does something similar -- produces a serialized array, and can be mechanically translated to a corresponding Java solution:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:my="my:my">
<xsl:output method="text"/>
<my:null>null</my:null>
<my:Q>"</my:Q>
<xsl:variable name="vNull" select="document('')/*/my:null"/>
<xsl:variable name="vQ" select="document('')/*/my:Q"/>
<xsl:template match="/">
<xsl:variable name="vGroup1" select=
"/*/h1[1]
/following-sibling::a
[not(/*/h1[2])
or
count(.|/*/h1[2]/preceding-sibling::a)
=
count(/*/h1[2]/preceding-sibling::a)
]
"/>
<xsl:variable name="vGroup2" select=
"/*/h1[2]
/following-sibling::a
[not(/*/h1[3])
or
count(.|/*/h1[3]/preceding-sibling::a)
=
count(/*/h1[3]/preceding-sibling::a)
]
"/>
<xsl:variable name="vGroup3" select=
"/*/h1[3]
/following-sibling::a
[not(/*/h1[4])
or
count(.|/*/h1[4]/preceding-sibling::a)
=
count(/*/h1[4]/preceding-sibling::a)
]
"/>
[<xsl:value-of select=
"concat($vQ[$vGroup1/self::a[1]],
$vGroup1/self::a[1],
$vQ[$vGroup1/self::a[1]],
$vNull[not($vGroup1/self::a[1])])"/>
<xsl:text>,</xsl:text>
<xsl:value-of select=
"concat($vQ[$vGroup2/self::a[1]],
$vGroup2/self::a[1],
$vQ[$vGroup2/self::a[1]],
$vNull[not($vGroup2/self::a[1])])"/>
<xsl:text>,</xsl:text>
<xsl:value-of select=
"concat($vQ[$vGroup3/self::a[1]],
$vGroup3/self::a[1],
$vQ[$vGroup3/self::a[1]],
$vNull[not($vGroup3/self::a[1])])"/>]
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the same XML document (above), the wanted, correct result is produced:
["a1",null,"a3"]
Update2:
Now the OP has added that he can use an XSLT solution. Here is one:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:my="my:my" exclude-result-prefixes="xsl">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:key name="kFollowing" match="a"
use="generate-id(preceding-sibling::h1[1])"/>
<my:null/>
<xsl:variable name="vNull" select="document('')/*/my:null"/>
<xsl:template match="/*">
<xsl:copy-of select=
"h1/following-sibling::a[1]
|
h1[not(key('kFollowing', generate-id()))]"/>
=============================================
<xsl:apply-templates select="h1"/>
</xsl:template>
<xsl:template match="h1">
<xsl:variable name="vAsInGroup" select=
"key('kFollowing', generate-id())"/>
<xsl:copy-of select="$vAsInGroup[1] | $vNull[not($vAsInGroup)]"/>
</xsl:template>
</xsl:stylesheet>
This transformation implements two different solutions. The difference is in what element is used to represent "null". In the first case it is the h1
element. This isn't recommended, because any h1
already has its own meaning which is different from "representing null". The second solution uses a special my:null
element to represent null.
When this transformation is applied on the same XML document as above:
<t>
<h1>heading1</h1>
<a>a1</a>
<b>b1</b>
<h1>heading2</h1>
<b>b2</b>
<h1>heading3</h1>
<a>a3</a>
<b>b3</b>
<c>c3</c>
</t>
each of the two XPath expressions (containing XSLT key()
references) are evaluated and the selected nodes are output (above and below "========", respectively):
<a>a1</a>
<h1>heading2</h1>
<a>a3</a>
=============================================
<a>a1</a>
<my:null xmlns:my="my:my" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"/>
<a>a3</a>
Note on performance:
Because keys are used, this solution will be significantly more efficient when more than one search is made -- for example, when the corresponding arrays for a
, b
, and c
need to be produced.
I suggest you use the following, which might be rewritten to an xsl:function where the parent node name (here: div) is parametrized.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<root>
<aList><xsl:copy-of select="$divIncludingNulls//a"/></aList>
<bList><xsl:copy-of select="$divIncludingNulls//b"/></bList>
<cList><xsl:copy-of select="$divIncludingNulls//c"/></cList>
</root>
</xsl:template>
<xsl:variable name="divChild" select="distinct-values(//div/*/name())"/>
<xsl:variable name="divIncludingNulls">
<xsl:for-each select="//div">
<xsl:variable name="divElt" select="."/>
<div>
<xsl:for-each select="$divChild">
<xsl:variable name="divEltvalue" select="$divElt/*[name()=current()]"/>
<xsl:element name="{.}">
<xsl:choose>
<xsl:when test="$divEltvalue"><xsl:value-of select="$divEltvalue"/></xsl:when>
<xsl:otherwise>null</xsl:otherwise>
</xsl:choose>
</xsl:element>
</xsl:for-each>
</div>
</xsl:for-each>
</xsl:variable>
</xsl:stylesheet>
Applied to
<?xml version="1.0" encoding="UTF-8"?>
<root>
<div>
<a>a1</a>
<b>b1</b>
</div>
<div>
<b>b2</b>
</div>
<div>
<a>a3</a>
<b>b3</b>
<c>c3</c>
</div>
</root>
the output is
<?xml version="1.0" encoding="UTF-8"?>
<root>
<aList>
<a>a1</a>
<a>null</a>
<a>a3</a>
</aList>
<bList>
<b>b1</b>
<b>b2</b>
<b>b3</b>
</bList>
<cList>
<c>null</c>
<c>null</c>
<c>c3</c>
</cList>
</root>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With