Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Search by class in Nokogiri nodeset

I got the name of a CSS class from a Nokogiri node. Now I want to find all the nodes that also have the same class attached.

I don't know which HTML tag the element that I'm looking for has, and how deep it is. All i know is what class to search for.

I have already tried:

doc.xpath("//*[contains(@class, #{css})]")

But this seems to return WAY too many elements.

Also I have tried:

doc.xpath("//*[@class, #{css}]")

and this returns nothing.

I want to get the elements that contain that class, not every element that surrounds an element with that class.

Is it possible to do this with Nokogiri?

like image 660
user2926430 Avatar asked Oct 26 '25 20:10

user2926430


2 Answers

As I said in my comment, .css() or .search() can find all elements of a given class.

Here's an example from a scraper I wrote a while ago. It finds the only .content div on the page (at() will select the first element only), and then finds all .col divs inside it. Then it loops through them and prints the title.

content = page.at('.content')
content.css('.col').each do |col|
    puts col.at('h5').text
end
like image 174
Jimeux Avatar answered Oct 28 '25 09:10

Jimeux


Assuming that the class name is stored into class_name, I think that

doc.xpath("//*[contains(concat(' ', normalize-space(@class), ' '), ' #{class_name} ')]")

is what you're looking for.

This will match all the elements that contain class_name into their classes, ie if class_name is 'box', then it will match both elements like div class="box" and elements like div class="box left"

If you only want to match elements like div class="box" ie that have only one class and that class is the one you're looking for, then you could use this:

doc.xpath("//*[@class=\"#{class_name}\"]")
like image 33
egwspiti Avatar answered Oct 28 '25 11:10

egwspiti



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!