Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

xpath 'or' behaving like union ('|') with xmllib2

I have XML documents like:

<rootelement>
<myelement>test1</myelement>
<myelement>test2</myelement>
<myelement type='specific'>test3</myelement>
</rootelement>

I'd like to retrieve the specific myelement, and if it's not present, then the first one. So I write:

/rootelement/myelement[@type='specific' or position()=1]

The XPath spec states about the 'or expression' that:

The right operand is not evaluated if the left operand evaluates to true

The problem is that libxml2-2.6.26 seems to apply the union of both expressions, returning a "2 Node Set" (for example using xmllint --shell).

Is it libxml2 or am I doing anything wrong ?

like image 658
foudfou Avatar asked Aug 16 '10 14:08

foudfou


2 Answers

Short answer: your selector doesn't express what you think it does.


The or operator is a union.

The part of the spec you quoted ("The right operand is not evaluated...") is part of standard boolean logic short circuiting.

Here's why you get a 2-node set for your example input: XPath looks at every myelement that's a child of rootelement, and applies the [@type='specific' or position()=1] part to each such node to determine whether or not it matches the selector.

  1. <myelement>test1</myelement> does not match @type='specific', but it does match position()=1, so it matches the whole selector.
  2. <myelement>test2</myelement> does not match @type='specific', and it also does not match position()=1, so it does not match the whole selector.
  3. <myelement type='specific'>test3</myelement> matches @type='specific' (so XPath does not have to test its position - that's the short-circuiting part) so it matches the whole selector.

The first and last <myelement>s match the whole selector, so it returns a 2-node set.

The easiest way to select elements the way you want to is to do it in two steps. Here's the pseudocode (I don't know what context you're actually using XPath in, and I'm not that familiar with writing XPath-syntax selectors):

  1. Select elements that match /rootelement/myelement[@type='specific']
  2. If elements is empty, select elements that match /rootelement/myelement[position()=1]
like image 161
Matt Ball Avatar answered Nov 10 '22 07:11

Matt Ball


@Matt Ball explained very well the cause of your problem.

Here is an XPath one-liner selecting exactly what you want:

/*/myelement[@type='specific'] | /*[not(myelement[@type='specific'])]/myelement[1] 
like image 37
Dimitre Novatchev Avatar answered Nov 10 '22 08:11

Dimitre Novatchev