Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get link and href text from html doc with Nokogiri & Ruby?

Tags:

ruby

nokogiri

I'm trying to use the nokogiri gem to extract all the urls on the page as well their link text and store the link text and url in a hash.

<html>
    <body>
        <a href=#foo>Foo</a>
        <a href=#bar>Bar </a>
    </body>
</html>

I would like to return

{"Foo" => "#foo", "Bar" => "#bar"}
like image 568
sunnyrjuneja Avatar asked Feb 17 '12 21:02

sunnyrjuneja


1 Answers

Here's a one-liner:

Hash[doc.xpath('//a[@href]').map {|link| [link.text.strip, link["href"]]}]

#=> {"Foo"=>"#foo", "Bar"=>"#bar"}

Split up a bit to be arguably more readable:

h = {}
doc.xpath('//a[@href]').each do |link|
  h[link.text.strip] = link['href']
end
puts h

#=> {"Foo"=>"#foo", "Bar"=>"#bar"}
like image 64
Mark Thomas Avatar answered Nov 06 '22 10:11

Mark Thomas