Having loaded a (X)HTML page, I'm trying to get the value of a meta tag's "content" attribute. For example, given:
<meta name="author" content="John Smith" />
I'd like to extract the value "John Smith".
I know how to do that using XPath and understand that CSS was meant primarily for element selection but Nokogiri supports defining custom CSS pseudoclasses which I thought could be used as follows:
class CSSext
def attr(nodeset, tag)
nodeset.first.attribute_nodes.find_all {|node| node.name == tag}
end
end
doc = Nokogiri::HTML(open(someurl))
doc.css("meta[name='name']:attr('content')", CSSext.new)
However, this returns the same result as
doc.css("meta[name='name']")
What gives? Nokogiri uses the same engine underneath for both CSS and XPath searches so anything that's possible in XPath should be doable in CSS. How should I go about extracting the attribute value?
Why not just?
doc.at("meta[name='author']")['content']
As far as I understand, pseudoclasses can be used to filter the nodeset only, but not to replace the nodeset with some other value such as the value of one of the nodes's attribute.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With