I want to replace the inner_text in all paragraphs in my XHTML document.
I know I can get all text with Nokogiri like this
doc.xpath("//text()")
But I want only operate on text in paragraphs, how I can select all text in paragraphs without affecting eventually existent anchor texts in links ?
#For example : <p>some text <a href="/">This should not be changed</a> another one</p>
For text which is an immediate child of a paragraph use //p/text()
irb> h = '<p>some text <a href="/">This should not be changed</a> another one</p>'
=> ...
irb> doc = Nokogiri::HTML(h)
=> ...
irb> doc.xpath '//p/text()'
=> [#<Nokogiri::XML::Text:0x80ac2e04 "some text ">, #<Nokogiri::XML::Text:0x80ac26c0 " another one">]
For text which is a descendent (immediate or not) of a paragraph use //p//text(). To exclude those texts which have an anchor as a parent, you could just subtract them off.
irb> doc.xpath('//p//text()') - doc.xpath('//p//a/text()')
=> [#<Nokogiri::XML::Text:0x80ac2e04 "some text ">, #<Nokogiri::XML::Text:0x80ac26c0 " another one">]
There is probably a way to do it with one call, but my xpath knowledge doesn't go that deep.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With