Given markup like:
<p>
<code>foo</code><code>bar</code>
<code>jim</code> and then <code>jam</code>
</p>
I need to select the first three <code>
—but not the last. The logic is "Select all code
elements that have a preceding-or-following-sibling-element that is also a code
, unless there exist one or more text nodes with non-whitespace content between them.
Given that I am using Nokogiri (which uses libxml2) I can only use XPath 1.0 expressions.
Although a tricky XPath expression is desired, Ruby code/iterations to perform the same on a Nokogiri document are also acceptable.
Note that the CSS adjacent sibling selector ignores non-element nodes, and so selecting nokodoc.css('code + code')
will incorrectly select the last <code>
block.
Nokogiri.XML('<r><a/><b/> and <c/></r>').css('* + *').map(&:name)
#=> ["b", "c"]
Edit: More test cases, for clarity:
<section><ul>
<li>Go to <code>N</code> and
then <code>Y</code><code>Y</code><code>Y</code>.
</li>
<li>If you see <code>N</code> or <code>N</code> then…</li>
</ul>
<p>Elsewhere there might be: <code>N</code></p>
<p><code>N</code> across parents.</p>
<p>Then: <code>Y</code> <code>Y</code><code>Y</code> and <code>N</code>.</p>
<p><code>N</code><br/><code>N</code> elements interrupt, too.</p>
</section>
All the Y
above should be selected. None of the N
should be selected. The content of the <code>
are used only to indicate which should be selected: you may not use the content to determine whether or not to select an element.
The context elements in which the <code>
appear are irrelevant. They may appear in <li>
, they may appear in <p>
, they may appear in something else.
I want to select all the consecutive runs of <code>
at once. It is not a mistake that there is a space character in the middle of one of sets of Y
.
Use:
//code
[preceding-sibling::node()[1][self::code]
or
preceding-sibling::node()[1]
[self::text()[not(normalize-space())]]
and
preceding-sibling::node()[2][self::code]
or
following-sibling::node()[1][self::code]
or
following-sibling::node()[1]
[self::text()[not(normalize-space())]]
and
following-sibling::node()[2][self::code]
]
XSLT - based verification:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select=
"//code
[preceding-sibling::node()[1][self::code]
or
preceding-sibling::node()[1]
[self::text()[not(normalize-space())]]
and
preceding-sibling::node()[2][self::code]
or
following-sibling::node()[1][self::code]
or
following-sibling::node()[1]
[self::text()[not(normalize-space())]]
and
following-sibling::node()[2][self::code]
]"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<section><ul>
<li>Go to <code>N</code> and
then <code>Y</code><code>Y</code><code>Y</code>.
</li>
<li>If you see <code>N</code> or <code>N</code> then…</li>
</ul>
<p>Elsewhere there might be: <code>N</code></p>
<p><code>N</code> across parents.</p>
<p>Then: <code>Y</code> <code>Y</code><code>Y</code> and <code>N</code>.</p>
<p><code>N</code><br/><code>N</code> elements interrupt, too.</p>
</section>
the contained XPath expression is evaluated and the selected nodes are copied to the output:
<code>Y</code>
<code>Y</code>
<code>Y</code>
<code>Y</code>
<code>Y</code>
<code>Y</code>
//code[
(
following-sibling::node()[1][self::code]
or (
following-sibling::node()[1][self::text() and normalize-space() = ""]
and
following-sibling::node()[2][self::code]
)
)
or (
preceding-sibling::node()[1][self::code]
or (
preceding-sibling::node()[1][self::text() and normalize-space() = ""]
and
preceding-sibling::node()[2][self::code]
)
)
]
I think this does what you want, though I won’t claim you’d actually want to use it.
I’m assuming text nodes are always merged together so that there won’t be two adjacent to each other, which I believe is generally the case, but might not be if you’re doing DOM manipulations beforehand. I’ve also assumed that there won’t be any other elements between code
elements, or that if there are they prevent selection like non-whitespace text.
I think this is what you want:
/p/code[not(preceding-sibling::text()[not(normalize-space(.)="")])]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With