Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Nokogiri html parsing question

Tags:

ruby

nokogiri

I'm having trouble figuring out why I can't get keywords to parse properly through nokogiri. In the following example, I have the a href link text functionality working properly but cannot figure out how to pull the keywords.

This is the code I have thus far:

.....

doc = Nokogiri::HTML(open("http://www.cnn.com"))
doc.xpath('//a/@href').each do |node|
#doc.xpath("//meta[@name='Keywords']").each do |node|

puts node.text

....

This successfully renders all of the a href text in the page, but when I try to use it for keywords it doesn't show anything. I've tried several variations of this with no luck. I assume that the the ".text" callout after node is wrong, but I'm not sure.

My apologies for how rough this code is, I'm doing my best to learn here.

like image 228
paradoxic Avatar asked Aug 09 '10 16:08

paradoxic


1 Answers

You're correct, the problem is text. text returns the text between the opening tag and the closing tag. Since meta-tags are empty, this gives you the empty string. You want the value of the "content" attribute instead.

doc.xpath("//meta[@name='Keywords']/@content").each do |attr|
  puts attr.value
end

Since you know that there will be only one meta-tag with the name "keywords", you don't actually need to loop through the results, but can take the first item directly like this:

puts doc.xpath("//meta[@name='Keywords']/@content").first.value

Note however, that this will cause an error if there is no meta-tag with the name "content", so the first option might be preferable.

like image 86
sepp2k Avatar answered Nov 03 '22 02:11

sepp2k