Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

nokogiri and xpath -- nested looping with data set

I'm trying to loop through the elements in each but am having issues with the inner loop below. It appears to me the xpath pattern '*/td' is not returning any results. I'm expecting to see the data inside the tags printed to stdout. I'm using nokogiri.

I'm pasting this into my rails console:

require 'nokogiri'
f = File.open("public/index.html")
doc = Nokogiri::HTML(f)
f.close

doc.xpath('//*[@id="WhoIsOnDutyTableLevel4"]/tbody/tr').each do |row|
  puts "row= " + row.to_s
  row.xpath('*/td').each do |td|
    puts "td= " + td
  end
end

And here's the output from the console:

row= <tr id="208894">
<td headers="WhoIsOnDutyTableLevel1:header:1"><a href="/alarmpoint/UserDevices.do;jsessionid=17gaw4aw5pv8s?_data=KpBkJeR08z6mdgIY4sPrzAixAYz%2BqH6ZPkanPQ24VqQFpjRFPQiWigQHttJBTMFaCLEBjP6ofpk%2B%0D%0ARqc9DbhWpI1nHAqm8ex%2BxOmu7xYUNxRSU0XUo1xoRw%3D%3D" name="user1" id="user1" class="details">User 1</a></td>
<td headers="WhoIsOnDutyTableLevel1:header:2">PERSON</td>
<td headers="WhoIsOnDutyTableLevel1:header:3">0</td>
</tr>
row= <tr id="207792">
<td headers="WhoIsOnDutyTableLevel1:header:1"><a href="/alarmpoint/UserDevices.do;jsessionid=17gaw4aw5pv8s?_data=KpBkJeR08z6AOzsYzBi7dAixAYz%2BqH6ZPkanPQ24VqQFpjRFPQiWigQHttJBTMFaCLEBjP6ofpk%2B%0D%0ARqc9DbhWpI1nHAqm8ex%2BxOmu7xYUNxRSU0XUo1xoRw%3D%3D" name="user2" id="user2" class="details">User 2</a></td>
<td headers="WhoIsOnDutyTableLevel1:header:2">PERSON</td>
<td headers="WhoIsOnDutyTableLevel1:header:3">5</td>
</tr>
=> 0

Here's the html I'm parsing:

<table class="duty-report-level1" id="WhoIsOnDutyTableLevel1">
<caption></caption>
<thead>

<tr>
<th id="WhoIsOnDutyTableLevel1:header:1" class="duty-report-lt-header">c</th>
</tr>
</thead>
<tfoot></tfoot>
<tbody>
<tr>
<td headers="WhoIsOnDutyTableLevel1:header:1">
<table class="duty-report-level2" id="WhoIsOnDutyTableLevel2">
<caption></caption>
<thead>
<tr>
<th id="WhoIsOnDutyTableLevel1:header:1">Group Name</th><th id="WhoIsOnDutyTableLevel1:header:2">Group Time Zone</th><th id="WhoIsOnDutyTableLevel1:header:3">Default Devices</th><th id="WhoIsOnDutyTableLevel1:header:4">Supervisors</th>

</tr>
</thead>
<tfoot></tfoot>
<tbody>
<tr>
<td headers="WhoIsOnDutyTableLevel1:header:1"><a href="/alarmpoint/GroupDetails.do;jsessionid=17gaw4aw5pv8s?_data=TJZuNquzHUgWcre8AVcKpAFRUsezgPKzbHn7hwtTf9Ei0C2PJ8QYcKIy8OkorCWT8HDTAzkon1ls%0D%0AefuHC1N%2F0SLQLY8nxBhwesdd7Zeg6NzvCfuzRqLg5g%3D%3D" name="team1" id="team1" class="details">Team 1</a></td><td headers="WhoIsOnDutyTableLevel1:header:2" class="centered-text">US/Pacific</td><td headers="WhoIsOnDutyTableLevel1:header:3" class="centered-text"><img src="/static/images/icon_boolean_false.png" alt="No" border="0"></td><td headers="WhoIsOnDutyTableLevel1:header:4">
<values>
</values><a href="/alarmpoint/UserDevices.do;jsessionid=17gaw4aw5pv8s?_data=KpBkJeR08z7AnuRhH67H6AixAYz%2BqH6ZPkanPQ24VqQFpjRFPQiWigQHttJBTMFaCLEBjP6ofpk%2B%0D%0ARqc9DbhWpI1nHAqm8ex%2BxOmu7xYUNxRSU0XUo1xoRw%3D%3D" name="mgr1" id="mgr1" class="details">Mgr 1</a>
<br>








</td>
</tr>
<tr>
<td headers="WhoIsOnDutyTableLevel1:header:1" class="no-padding" colspan="4">
<table class="duty-report-level3" id="WhoIsOnDutyTableLevel3">
<caption></caption>
<thead>
<tr>
<th id="WhoIsOnDutyTableLevel1:header:1" class="th-left">a</th><th id="WhoIsOnDutyTableLevel1:header:2" class="">b</th>
</tr>
</thead>

<tfoot></tfoot>
<tbody>
<tr>
<td headers="WhoIsOnDutyTableLevel1:header:1" class="no-padding" colspan="2">
<table class="duty-report-level4" id="WhoIsOnDutyTableLevel4">
<caption></caption>
<thead>
<tr>
<th id="WhoIsOnDutyTableLevel1:header:1">Recipient</th><th id="WhoIsOnDutyTableLevel1:header:2">Category</th><th id="WhoIsOnDutyTableLevel1:header:3">Escalation</th>
</tr>
</thead>
<tfoot></tfoot>
<tbody>
<tr id="208894">

<td headers="WhoIsOnDutyTableLevel1:header:1"><a href="/alarmpoint/UserDevices.do;jsessionid=17gaw4aw5pv8s?_data=KpBkJeR08z6mdgIY4sPrzAixAYz%2BqH6ZPkanPQ24VqQFpjRFPQiWigQHttJBTMFaCLEBjP6ofpk%2B%0D%0ARqc9DbhWpI1nHAqm8ex%2BxOmu7xYUNxRSU0XUo1xoRw%3D%3D" name="user1" id="user1" class="details">User 1</a></td><td headers="WhoIsOnDutyTableLevel1:header:2">PERSON</td><td headers="WhoIsOnDutyTableLevel1:header:3">0</td>
</tr>
<tr id="207792">
<td headers="WhoIsOnDutyTableLevel1:header:1"><a href="/alarmpoint/UserDevices.do;jsessionid=17gaw4aw5pv8s?_data=KpBkJeR08z6AOzsYzBi7dAixAYz%2BqH6ZPkanPQ24VqQFpjRFPQiWigQHttJBTMFaCLEBjP6ofpk%2B%0D%0ARqc9DbhWpI1nHAqm8ex%2BxOmu7xYUNxRSU0XUo1xoRw%3D%3D" name="user2" id="user2" class="details">User 2</a></td><td headers="WhoIsOnDutyTableLevel1:header:2">PERSON</td><td headers="WhoIsOnDutyTableLevel1:header:3">5</td>
</tr>




</tbody>
</table>

</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
like image 453
sybind Avatar asked Oct 21 '25 02:10

sybind


1 Answers

You need a minor change to your XPath:

doc.xpath('//*[@id="WhoIsOnDutyTableLevel4"]/tbody/tr').each do |row|
  # puts "row= " + row.to_s
  row.xpath('./td').each do |td|
    puts "td= " + td.text
  end
end

Which outputs:

td= User 1
td= PERSON
td= 0
td= User 2
td= PERSON
td= 5

Using ./td as the XPath for td basically means "from this point look down one".

Personally, unless you absolutely need XPath, I recommend using CSS accessors. They are more readable, and often much more simple:

doc.search('#WhoIsOnDutyTableLevel4 tbody tr').each do |row|
  row.search('td').each do |td|
    puts "td= " + td.text
  end
end

I recommend using search instead of css or xpath and at instead of at_css or at_xpath. There is no real magic going on when you choose one over the other and you only have to remember two different methods.

like image 96
the Tin Man Avatar answered Oct 23 '25 01:10

the Tin Man