I'm working to do a crawl, but before I crawl an entire website, I would like to shoot off a test, of to or so pages. So I was thinking something like below would work, but I keep getting a nomethoderror....
Anemone.crawl(self.url) do |anemone|
anemone.focus_crawl do |crawled_page|
crawled_page.links.slice(0..10)
page = pages.find_or_create_by_url(crawled_page.url)
logger.debug(page.inspect)
page.check_for_term(self.term, crawled_page.body)
end
end
NoMethodError (private method `select' called for true:TrueClass):
app/models/site.rb:14:in `crawl'
app/controllers/sites_controller.rb:96:in `block in crawl'
app/controllers/sites_controller.rb:95:in `crawl'
Basically I want to have a way to first craw only 10 pages, but I seem to be not understanding the basics here. Can someone help me out? Thanks!!
Add this monkeypatch to your crawling file.
module Anemone
class Core
def kill_threads
@tentacles.each { |thread|
Thread.kill(thread) if thread.alive?
}
end
end
end
Here is an example of how to use it after you've added it to your crawling file.Then in the file which you are running your add this to your anemone.on_every_page method
@counter = 0
Anemone.crawl(http://stackoverflow.com, :obey_robots => true) do |anemone|
anemone.on_every_page do |page|
@counter+= 1
if @counter > 10
anemone.kill_threads
end
end
end
Source: https://github.com/chriskite/anemone/issues/24
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With