Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does contains() do in XPath?

Tags:

xml

xpath

I have two almost identical tables, the only difference being the input tag in the first one:

Table #1

  <table>
    <tbody>
      <tr>
        <td>
          <div>
            <input type="text" name="" value=""/>
          </div>
        </td>
      </tr>
    </tbody>
  </table>

Table #2

  <table>
    <tbody>
      <tr>
        <td>
          <div></div>
        </td>
      </tr>
    </tbody>
  </table>
</body>

When I use this XPath //table//tbody//tr[position()=1 and contains(.,input)] it returns both tables' 1st row, not just the 1st table 1st row as I expect.

However, this XPath //table//tbody//tr[position()=1]//input returns just the input in the first one.

So, what am I doing wrong? Why the same input is associated with both tables? Am I misusing the . here somehow?

like image 644
ephemeris Avatar asked Dec 24 '22 10:12

ephemeris


2 Answers

Due to an unfortunate choice in function names1, many people mistake the purpose of the contains() function in XPath:

  • XPath contains() does not check for element containment.
  • XPath contains() checks for substring containment.

Therefore, tr[contains(.,input)] doesn't do what you think it does. It actually selects tr elements whose string-value contains a substring equal to the string-value of the first immediate child input element; see this answer for further details. (Interestingly, such a predicate simplifies to true because the hierarchical nature of the definition of string-value implies substring containment between string values of parent and child elements.) Anyway, that's clearly not your intent.

To check for descendant element containment, use .//input instead. This can be placed as a predicate of tr as your first XPath attempted to do, if it's tr elements that you wish to select,

//table//tbody//tr[position()=1 and .//input]

or table (as shown by @Andersson), if it's really table elements that you wish to select that contain an input descendant element:

//table[.//input]

Why XPath contains() should have been named string-contains()

1In the context of XML, which is so strongly based upon the notion of hierarchy, it is natural to assume that contains refers to hierarchical containment. Of the 24 times the word contains appears in the original XPath specification, 19 times it means hierarchical node containment; only 5 times does it mean substring containment. It's no wonder that confusion over contains() exists. The XPath substring contains() function should have been named string-contains().

like image 150
kjhughes Avatar answered Dec 25 '22 22:12

kjhughes


You should try

//table[.//input]

to fetch table node that has input descendant

like image 37
Andersson Avatar answered Dec 26 '22 00:12

Andersson