I am using Nokogiri to pull the <h1>
and <title>
tags,
but I am having trouble getting these:
<meta name="description" content="I design and develop websites and applications.">
<meta name="keywords" content="web designer,web developer">
I have this code:
url = 'https://en.wikipedia.org/wiki/Emma_Watson'
page = Nokogiri::HTML(open(url))
puts page.css('title')[0].text puts page.css('h1')[0].text
puts page.css('description')
puts META DESCRIPTION
puts META KEYWORDS
I looked in the docs and didn't find anything. Would I use regex to do this?
Thanks.
Another solution: You can use XPath or CSS.
puts page.xpath('/html/head/meta[@name="description"]/@content').to_s
puts page.xpath('/html/head/meta[@name="keywords"]/@content').to_s
Here's how I'd go about it:
require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
<meta name="description" content="I design and develop websites and applications.">
<meta name="keywords" content="web designer,web developer">
EOT
contents = %w[description keywords].map { |name|
doc.at("meta[name='#{name}']")['content']
}
contents # => ["I design and develop websites and applications.", "web designer,web developer"]
Or:
contents = doc.search("meta[name='description'], meta[name='keywords']").map { |n|
n['content']
}
contents # => ["I design and develop websites and applications.", "web designer,web developer"]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With