Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

anemone Ruby with focus_crawl

Tags:

ruby

rubygems

I'm working to do a crawl, but before I crawl an entire website, I would like to shoot off a test, of to or so pages. So I was thinking something like below would work, but I keep getting a nomethoderror....

Anemone.crawl(self.url) do |anemone|
      anemone.focus_crawl do |crawled_page|
        crawled_page.links.slice(0..10)
        page = pages.find_or_create_by_url(crawled_page.url)
        logger.debug(page.inspect)
        page.check_for_term(self.term, crawled_page.body)
      end
    end

NoMethodError (private method `select' called for true:TrueClass):
    app/models/site.rb:14:in `crawl'
    app/controllers/sites_controller.rb:96:in `block in crawl'
    app/controllers/sites_controller.rb:95:in `crawl'

Basically I want to have a way to first craw only 10 pages, but I seem to be not understanding the basics here. Can someone help me out? Thanks!!

like image 567
tspore Avatar asked Jun 11 '26 13:06

tspore


1 Answers

Add this monkeypatch to your crawling file.

module Anemone
    class Core
        def kill_threads
            @tentacles.each { |thread| 
                Thread.kill(thread)  if thread.alive?
            }
        end
    end
end

Here is an example of how to use it after you've added it to your crawling file.Then in the file which you are running your add this to your anemone.on_every_page method

@counter = 0
Anemone.crawl(http://stackoverflow.com, :obey_robots => true) do |anemone|
    anemone.on_every_page do |page|
        @counter+= 1 
        if @counter > 10
            anemone.kill_threads
        end
    end
end

Source: https://github.com/chriskite/anemone/issues/24

like image 80
sunnyrjuneja Avatar answered Jun 13 '26 14:06

sunnyrjuneja