Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why normalize-space(text()) ignores internal nodes when selecting by text?

Tags:

html

xpath

why in following example I can use //label[text()[normalize-space() = 'some label']] or //label[normalize-space(text()) = 'some label'] to select label by text and ignore span's content? Why? I really want to understand this issue. In http://www.w3.org/TR/xpath/#function-normalize-space there is no info about this functionality. This is exactly what I want, but I also desperately want to know why this solution works:)

BTW, which syntax is better: //label[text()[normalize-space() = 'some label']] vs //label[normalize-space(text()) = 'some label'] and why?

<label>
<span>some span</span>
  some label   
</label>

<label>
    other label
<span>other span</span>
</label>

I'm looking for your helpful answer:)

like image 917
master.py Avatar asked Nov 08 '14 16:11

master.py


2 Answers

text() returns all text nodes that are children of the current node (the label)

But some span is not a child of the label, it is a child of the span.

You can use //text() to get all descendant text nodes, or span/text() to get the text nodes of the span

--

You need to use //label[//text()[normalize-space() = 'some label']] instead of //label[normalize-space(//text()) = 'some label'], because latter only works if there is a single text node

like image 41
BeniBela Avatar answered Sep 23 '22 05:09

BeniBela


This has nothing to do with normalize-space(), and everything to do with text().

text() is short for child::text(), and selects the text nodes that are immediate children of the label element. Unless you are stripping whitespace text nodes, the label element in your example has two child text nodes, one of which is all whitespace, the other contains "some label" surrounded by whitespace.

BTW, which syntax is better: //label[text()[normalize-space() = 'some label']] vs //label[normalize-space(text()) = 'some label'] and why?

They do different things; the one that is better is the one that does what you want to achieve.

In XPath 1.0, the first expression selects label elements that have a child text node whose value, after whitespace normalization, equals "some label". The second selects label elements whose first child text node, after whitespace normalization, equals "some label". That's because normalize-space() (like all functions that expect a string), if you give it a node-set, takes the string value of the first node in the node-set.

In XPath 2.0, the first expression selects label elements that have a child text node whose value, after whitespace normalization, equals "some label". The second selects label elements if they have a child text node, after whitespace normalization, equals "some label", but raises an error if the label element has more than one child text node. That's because normalize-space() (like all functions that expect a string), atomizes its argument, and reports a type error if the length of the atomized sequence is greater than one.

like image 177
Michael Kay Avatar answered Sep 22 '22 05:09

Michael Kay