I'm having trouble figuring out why I can't get keywords to parse properly through nokogiri. In the following example, I have the a href link text functionality working properly but cannot figure out how to pull the keywords.
This is the code I have thus far:
.....
doc = Nokogiri::HTML(open("http://www.cnn.com"))
doc.xpath('//a/@href').each do |node|
#doc.xpath("//meta[@name='Keywords']").each do |node|
puts node.text
....
This successfully renders all of the a href text in the page, but when I try to use it for keywords it doesn't show anything. I've tried several variations of this with no luck. I assume that the the ".text" callout after node is wrong, but I'm not sure.
My apologies for how rough this code is, I'm doing my best to learn here.
You're correct, the problem is text
. text
returns the text between the opening tag and the closing tag. Since meta-tags are empty, this gives you the empty string. You want the value of the "content" attribute instead.
doc.xpath("//meta[@name='Keywords']/@content").each do |attr|
puts attr.value
end
Since you know that there will be only one meta-tag with the name "keywords", you don't actually need to loop through the results, but can take the first item directly like this:
puts doc.xpath("//meta[@name='Keywords']/@content").first.value
Note however, that this will cause an error if there is no meta-tag with the name "content", so the first option might be preferable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With