I'm using nokogiri to select the 'keywords' attribute like this:
puts page.parser.xpath("//meta[@name='keywords']").to_html
One of the pages I'm working with has the keywords label with a capital "K" which has motivated me to make the query case insensitive.
<meta name="keywords"> AND <meta name="Keywords">
So, my question is: What is the best way to make a nokogiri selection case insensitive?
EDIT Tomalak's suggestion below works great for this specific problem. I'd like to also use this example to help understand nokogiri better though and have a couple issues that I'm wondering about and have not been successful searching for. For example, are the regex 'pseudo classes' Nokogiri Docs appropriate for a problem like this?
I'm also curious about the matches?() method in nokogiri. I have not been able to find any clarification on the method. Does it have anything to do with the 'matches' concept in XPath 2.0 (and therefore could it be used to solve this problem)?
Thanks very much.
Nokogiri allows custom XPath functions. The nokogiri docs that you link to show an inline class definition for when you're only using it once. If you have a lot of custom functions or if you use the case-insensitive match a lot, you may want to define it in a class.
class XpathFunctions
def case_insensitive_equals(node_set, str_to_match)
node_set.find_all {|node| node.to_s.downcase == str_to_match.to_s.downcase }
end
end
Then call it like any other XPath function, passing in an instance of your class as the 2nd argument.
page.parser.xpath("//meta[case_insensitive_equals(@name,'keywords')]",
XpathFunctions.new).to_html
In your Ruby method, node_set
will be bound to a Nokogiri::XML::NodeSet
. In the case where you're passing in an attribute value like @name
, it will be a NodeSet with a single Nokogiri::XML::Attr
. So calling to_s
on it gives you its value. (Alternatively, you could use node.value
.)
Unlike using XPath translate
where you have to specify every character, this works on all the characters and character encodings that Ruby works on.
Also, if you're interested in doing other things besides case-insensitive matching that XPath 1.0 doesn't support, it's just Ruby at this point. So this is a good starting point.
Wrapped for legibility:
puts page.parser.xpath("
//meta[
translate(
@name,
'ABCDEFGHIJKLMNOPQRSTUVWXYZ',
'abcdefghijklmnopqrstuvwxyz'
) = 'keywords'
]
").to_html
There is no "to lower case" function in XPath 1.0, so you have to use translate()
for this kind of thing. Add accented letters as necessary.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With