Nokogiri text node contents

Question

Is there any clean way to get the contents of text nodes with Nokogiri? Right now I'm using

some_node.at_xpath( "//whatever" ).first.content

which seems really verbose for just getting text.

Mark Thomas · Accepted Answer

You want only the text?

doc.search('//text()').map(&:text)

Maybe you don't want all the whitespace and noise. If you want only the text nodes containing a word character,

doc.search('//text()').map(&:text).delete_if{|x| x !~ /\w/}

Edit: It appears you only wanted the text content of a single node:

some_node.at_xpath( "//whatever" ).text

the Tin Man · Answer

Just look for text nodes:

require 'nokogiri'

doc = Nokogiri::HTML(<<EOT)
<html>
<body>
<p>This is a text node </p>
<p> This is another text node</p>
</body>
</html>
EOT

doc.search('//text()').each do |t|
  t.replace(t.content.strip)
end

puts doc.to_html

Which outputs:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<p>This is a text node</p>
<p>This is another text node</p>
</body></html>

BTW, your code example doesn't work. at_xpath( "//whatever" ).first is redundant and will fail. at_xpath will find only the first occurrence, returning a Node. first is superfluous at that point, if it would work, but it won't because Node doesn't have a first method.

I have <data><foo>bar</foo></bar>, how I get at the "bar" text without doing doc.xpath_at( "//data/foo" ).children.first.content?

Assuming doc contains the parsed DOM:

doc.to_xml # => "<?xml version=\"1.0\"?>
<data>
  <foo>bar</foo>
</data>
"

Get the first occurrence:

doc.at('foo').text       # => "bar"
doc.at('//foo').text     # => "bar"
doc.at('/data/foo').text # => "bar"

Get all occurrences and take the first one:

doc.search('foo').first.text      # => "bar"
doc.search('//foo').first.text    # => "bar"
doc.search('data foo').first.text # => "bar"

Nokogiri text node contents

Tags:

ruby

nokogiri

cbmanica

2 Answers

Mark Thomas

the Tin Man

Recent Activity

Donate For Us

Nokogiri text node contents

Tags:

ruby

nokogiri

cbmanica

2 Answers

Mark Thomas

the Tin Man

Related questions

Recent Activity

Donate For Us