Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Mechanize gem to return a collection of links based on their position in the DOM

I am struggling with mechanize. I wish to "click" on a set of links which can only be identified by their position (all links within div#content) or their href.

I have tried both of these identification methods above without success.

From the documentation, I could not figure out how return a collection of links (for clicking) based on their position in the DOM, and not by attributes directly on the link.

Secondly, the documentation suggested you can you use :href to match a partial href,

page = agent.get('http://foo.com/').links_with(:href => "/something")

but the only way I can get it to return a link is by passing a fully qualified URL, e.g

page = agent.get('http://foo.com/').links_with(:href => "http://foo.com/something/a")

This is not very usefull if i want to return a collection of links with href's

http://foo.com/something/a
http://foo.com/something/b
http://foo.com/something/c
etc...

Am I doing something wrong? do I have unrealistic expectations?

like image 809
pingu Avatar asked May 08 '12 13:05

pingu


3 Answers

Part II The value you pass to :href has to be an exact match by default. So the href in your example would only match <a href="/something"></a> and not <a href="foo.com/something/a"></a>

What you want to do is to pass in a regex so that it will match a substring within the href field. Like so:

page = agent.get('http://foo.com/').links_with(:href => %r{/something/})

edit: Part I In order to get it to select links only in a link, add a nokogiri-style search method into your string. Like this:

page = agent.get('http://foo.com/').search("div#content").links_with(:href => %r{/something/})    # **

Ok, that doesn't work because after you do page = agent.get('http://foo.com/').search("div#content") you get a Nokogiri object back instead of a mechanize one, so links_with won't work. However you will be able to extract the links from the Nokogiri object using the css method. I would suggest something like:

page = agent.get('http://foo.com/').search("div#content").css("a")

If that doesn't work, I'd suggest checking out http://nokogiri.org/tutorials

like image 113
vlasits Avatar answered Nov 15 '22 00:11

vlasits


The nth link:

page.links[n-1]

The first 5 links:

page.links[0..4]

links with 'something' in the href:

page.links_with :href => /something/
like image 33
pguardiario Avatar answered Nov 15 '22 01:11

pguardiario


You can get mechanize links using nokogiri nodes. See the source code of links() method.

# File lib/mechanize/page.rb, line 352
def links
  @links ||= %w{ a area }.map do |tag|
    search(tag).map do |node|
      Link.new(node, @mech, self)
    end
  end.flatten
end

So that means:

the_links= page.search("valid_selector").map do |node|
  Mechanize::Page::Link.new(node, agent, page)
end

This will give you the useful href, text and uri methods.

like image 21
nurettin Avatar answered Nov 15 '22 00:11

nurettin