Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Nokogiri find text in paragraphs

I want to replace the inner_text in all paragraphs in my XHTML document.

I know I can get all text with Nokogiri like this

doc.xpath("//text()")

But I want only operate on text in paragraphs, how I can select all text in paragraphs without affecting eventually existent anchor texts in links ?

#For example : <p>some text <a href="/">This should not be changed</a> another one</p>
like image 247
astropanic Avatar asked May 08 '10 10:05

astropanic


1 Answers

For text which is an immediate child of a paragraph use //p/text()

irb> h = '<p>some text <a href="/">This should not be changed</a> another one</p>'
=> ...
irb> doc = Nokogiri::HTML(h)
=> ...
irb> doc.xpath '//p/text()'
=> [#<Nokogiri::XML::Text:0x80ac2e04 "some text ">, #<Nokogiri::XML::Text:0x80ac26c0 " another one">]

For text which is a descendent (immediate or not) of a paragraph use //p//text(). To exclude those texts which have an anchor as a parent, you could just subtract them off.

irb> doc.xpath('//p//text()') - doc.xpath('//p//a/text()')
=> [#<Nokogiri::XML::Text:0x80ac2e04 "some text ">, #<Nokogiri::XML::Text:0x80ac26c0 " another one">]

There is probably a way to do it with one call, but my xpath knowledge doesn't go that deep.

like image 63
jeem Avatar answered Nov 10 '22 22:11

jeem