Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XSLT Select all nodes containing a specific substring

Tags:

xslt

xpath

I'm trying to write an XPath that will select certain nodes that contain a specific word. In this case the word is, "Lockwood". The correct answer is 3. Both of these paths give me 3.

count(//*[contains(./*,'Lockwood')])
count(BusinessLetter/*[contains(../*,'Lockwood')])

But when I try to output the text of each specific node

//*[contains(./*,'Lockwood')][1]
//*[contains(./*,'Lockwood')][2]
//*[contains(./*,'Lockwood')][3]

Node 1 ends up containing all the text and nodes 2 and 3 are blank.

Can some one please tell me what's happening or what I'm doing wrong.

Thanks.

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="XPathFunctions.xsl"?>
<BusinessLetter>
 <Head>
  <SendDate>November 29, 2005</SendDate>
  <Recipient>
   <Name Title="Mr.">
    <FirstName>Joshua</FirstName>
    <LastName>Lockwood</LastName>
   </Name>
   <Company>Lockwood &amp; Lockwood</Company>
   <Address>
    <Street>291 Broadway Ave.</Street>
    <City>New York</City>
    <State>NY</State>
    <Zip>10007</Zip>
    <Country>United States</Country>
   </Address>
  </Recipient>
 </Head>
 <Body>
  <List>
   <Heading>Along with this letter, I have enclosed the following items:</Heading>
   <ListItem>two original, execution copies of the Webucator Master Services Agreement</ListItem>
   <ListItem>two original, execution copies of the Webucator Premier Support for Developers Services Description between Lockwood &amp; Lockwood and Webucator, Inc.</ListItem>
  </List>
  <Para>Please sign and return all four original, execution copies to me at your earliest convenience.  Upon receipt of the executed copies, we will immediately return a fully executed, original copy of both agreements to you.</Para>
  <Para>Please send all four original, execution copies to my attention as follows:

 <Person>
    <Name>
     <FirstName>Bill</FirstName>
     <LastName>Smith</LastName>
    </Name>
    <Address>
     <Company>Webucator, Inc.</Company>
     <Street>4933 Jamesville Rd.</Street>
     <City>Jamesville</City>
     <State>NY</State>
     <Zip>13078</Zip>
     <Country>USA</Country>
    </Address>
   </Person>
  </Para>
  <Para>If you have any questions, feel free to call me at <Phone>800-555-1000 x123</Phone> or e-mail me at <Email>[email protected]</Email>.</Para>
 </Body>
 <Foot>
  <Closing>
   <Name>
    <FirstName>Bill</FirstName>
    <LastName>Smith</LastName>
   </Name>
   <JobTitle>VP of Operations</JobTitle>
  </Closing>
 </Foot>
</BusinessLetter>
like image 288
Mike Avatar asked Jan 12 '11 19:01

Mike


1 Answers

But when I try to output the text of each specific node

//*[contains(./*,'Lockwood')][1] 
//*[contains(./*,'Lockwood')][2] 
//*[contains(./*,'Lockwood')][3] 

Node 1 ends up containing all the text and nodes 2 and 3 are blank

This is a FAQ.

//SomeExpression[1]

is not the equivalent to

(//someExpression)[1]

The former selects all //SomeExpression nodes that are the first child of their parent.

The latter selects the first (in document order) of all //SomeExpression nodes in the whole document.

How does this apply to your problem?

//*[contains(./*,'Lockwood')][1]

This selects all elements that have at least one child whose string value contains 'Lockwood' and that are the first such child of their parent. All three elements that have a text node containing the string 'Lockwood' are the first such child of their parents, so the result is that three elements are selected.

//*[contains(./*,'Lockwood')][2]

There is no element that has a child with string value containing the string 'Lockwood' and is the second such child of its parent. No nodes are selected.

//*[contains(./*,'Lockwood')][3]

There is no element that has a child with string value containing the string 'Lockwood' and is the third such child of its parent. No nodes are selected.

Solution:

Use:

(//*[contains(./*,'Lockwood')])[1]
(//*[contains(./*,'Lockwood')])[2]
(//*[contains(./*,'Lockwood')])[3]

Each of these selects exactly the Nth element (N = {1,2,3}) selected by //*[contains(./*,'Lockwood')], correspondingly: BusinesLetter, Recipient and Body.

Remember:

The [] operator has higher priority (precedence) than the // abbreviation.

like image 138
Dimitre Novatchev Avatar answered Oct 14 '22 01:10

Dimitre Novatchev