How do I extract the child text with Nokogiri?

Question

I encountered this HTML:

<div class='featured'>
    <h1>
        How to extract this?
        <span>Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</span>
        <span class="moredetail ">
            <a href="/hello" title="hello">hello</a>
        </span>
        <div class="clear"></div>
    </h1>
</div>

I want to extract the <h1> text "How to extract this?". How do I do so?

I tried with the following code, but there's other element appended. I am not sure how to exclude them so I get only the <h1> text itself.

doc = Nokogiri::HTML(open(url))      
records = doc.css(".featured h1")

Joshua Cheek · Accepted Answer

#css returns a collection, use #at_css to get the first matching node. All its contents, even text, are children, and in this case, the text is its first child. You could also do something like children.reject &element? if you wanted all the children that weren't elements.

data = '
<div class="featured">
    <h1>
        How to extract this?
        <span>Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</span>
        <span class="moredetail ">
            <a href="/hello" title="hello">hello</a>
        </span>
        <div class="clear"></div>
    </h1>
</div>
'

require 'nokogiri'
text = Nokogiri::HTML(data).at_css('.featured h1').children.first.text
text # => "
        How to extract this?
        "

Alternatively, you can use xpaths:

Nokogiri::HTML(data).at_xpath('//*[@class="featured"]/h1/text()').text

How do I extract the child text with Nokogiri?

Tags:

ruby

ruby-on-rails

nokogiri

TonyTakeshi

1 Answers

Joshua Cheek

Recent Activity

Donate For Us

How do I extract the child text with Nokogiri?

Tags:

ruby

ruby-on-rails

nokogiri

TonyTakeshi

1 Answers

Joshua Cheek

Related questions

Recent Activity

Donate For Us