Ruby Mechanize get elements with specified text

Question

I am trying to parse the contents of a website using mechanize and I am stuck at a point. The content that I want to parse is inside a li tag and is not always in the same order.

Lets suppose that we have the following where the order of li tags is not always the same and some times some may not even be there at all.

<div class="details">
  <ul>
    <li><span>title 1</span> ": here are the details"</li>
    <li><span>title 2</span> ": here are the details"</li>
    <li><span>title 3</span> ": here are the details"</li>
    <li><span>title 4</span> ": here are the details"</li>
  </ul>
</div>

What I want is to get only the li details where the span text is for example title 3. What I have done is the following which gives me the details from the first li:

puts page.at('.details').at('span', :text => "title 3").at("+ *").text

Is there a way to do what I want using mechanize or should I also use other means?

Rodri_gore · Accepted Answer

page.search(".details").at("span:contains('title 3')").parent.text

Explanation: With at you can use css or xpath selector. In order to make more readable and similar to your approach, this answer use css selector, but the problem is that CSS cannot perform selection based on text. Thanks to Nokogiri, you can use use JQuery selector, so the contains methods is allow.

The selection get the span element, so if you want to get the li element parent, you can use parent methods and then get the text easily.

Jeff LaJoie · Answer

Since you're looking to do this using Mechanize (and I see one of the comments recommend using Nokogiri instead) you should be aware that Mechanize is built on Nokogiri, so you're actually able to use any/all Nokogiri functionality through Mechanize.

To show you from the docs at http://mechanize.rubyforge.org/Mechanize.html

Mechanize.html_parser = Nokogiri::XML

So you can accomplish this using XPath and the mechanize page.search method.

page.search("//div[@class='details']/ul/li[span='title 3']").text

This should be able to give you the text for the li element that you're looking for. (unverified with .text, but the XPath does work)

You can test the XPath here: http://www.xpathtester.com/saved/51c5142c-dbef-4206-8fbc-1ba567373fb2

pguardiario · Answer

A cleaner css approach:

page.at('.details li:has(span[text()="title 3"])')

Ruby Mechanize get elements with specified text

Tags:

css

ruby

mechanize

George Karanikas

3 Answers

Rodri_gore

Jeff LaJoie

pguardiario

Recent Activity

Donate For Us

Ruby Mechanize get elements with specified text

Tags:

css

ruby

mechanize

George Karanikas

3 Answers

Rodri_gore

Jeff LaJoie

pguardiario

Related questions

Recent Activity

Donate For Us