Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove all nodes after a specified node [duplicate]

Tags:

ruby

nokogiri

I'm grabbing a div of text from a url and would like to remove everything underneath a paragraph which has a backtotop class. I'd seen a traverse snippet of code here on stackoverflow which looks promising, but I can't figure out how to get it incorporated so @el only contains everything up to the first p.backtotop in the div.

my code:

@doc = Nokogiri::HTML(open(url))
@el = @doc.css("div")[0]
end

traverse snippet:

doc = Nokogiri::HTML(code)
stop_node = doc.css("p.backtotop")
doc.traverse do |node|
break if node == stop_node
# else, do whatever, e.g. `puts node.name`
end
like image 692
ritchielee Avatar asked Sep 29 '11 15:09

ritchielee


People also ask

How do you remove consecutive duplicates in a linked list?

Write a function that takes a list sorted in non-decreasing order and deletes any duplicate nodes from the list. The list should only be traversed once. For example if the linked list is 11->11->11->21->43->43->60 then removeDuplicates() should convert the list to 11->21->43->60.

How do you remove duplicate nodes in a linked list in Java?

Write a removeDuplicates() function that takes a list and deletes any duplicate nodes from the list. The list is not sorted. For example if the linked list is 12->11->12->21->41->43->21 then removeDuplicates() should convert the list to 12->11->21->41->43.

How do you remove all occurrences of an element from a linked list in Java?

We need to first check for all occurrences at the head node and change the head node appropriately. Then we need to check for all occurrences inside a loop and delete them one by one.


1 Answers

  1. Find the div you want.
  2. Find the 'stop' item you want, and then find all the following siblings.
  3. Remove them.

For example:

<body>
  <div id="a">
    <h2>My Section</h2>
    <p class="backtotop">Back to Top</p>
    <p>More Content</p>
    <p>Even More Content</p>
  </div>
</body>
require 'nokogiri'
doc = Nokogiri::HTML(my_html)
div = doc.at('#a')
div.at('.backtotop').xpath('following-sibling::*').remove
puts div
#=> <div id="a">
#=>     <h2>My Section</h2>
#=>     <p class="backtotop">Back to Top</p>
#=>     
#=>     
#=>   </div>

Here's a more complicated example, where the backtotop item may not be at the root of the div:

<body>
  <div id="b">
    <h2>Another Section</h2>
    <section>
      <p class="backtotop">Back to Top</p>
      <p>More Content</p>
     </section>
    <p>Even More Content</p>
  </div>
</body>
require 'nokogiri'
doc = Nokogiri::HTML(my_html)
div = doc.at('#b')
n   = div.at('.backtotop')
until n==div
  n.xpath('following-sibling::*').remove
  n = n.parent
end

puts div
#=> <div id="b">
#=>     <h2>Another Section</h2>
#=>     <section><p class="backtotop">Back to Top</p>
#=>       
#=>      </section>
#=>   </div>

If your HTML is more complicated than the above then please provide an actual sample along with the result you want. This is good advice for any future question you ask.

like image 194
Phrogz Avatar answered Nov 14 '22 17:11

Phrogz