Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In a Scrapy bot, how do I call one function from within another?

I know this is a newbie question, and it's a basic Python question, but it's within the context of Scrapy and I can't find the answer anywhere.

When I run this bot code:

import scrapy

from tutorial.items import DmozItem

class DmozSpider(scrapy.Spider):
    name = "dmoz"
    allowed_domains = ["lib-web.org"]
    start_urls = [
        "http://www.lib-web.org/united-states/public-libraries/michigan/"
    ]

    count = 0

    def increment(self):
        global count
        count += 1

    def getCount(self):
        global count
        return count

    def parse(self, response):
        increment()
        for sel in response.xpath('//div/div/div/ul/li'):
            item = DmozItem()
            item['title'] = sel.xpath('a/text()').extract()
            item['link'] = sel.xpath('a/@href').extract()
            item['desc'] = sel.xpath('p/text()').extract()
            x = getCount()
            print x
            yield item

DmozItem:

import scrapy

class DmozItem(scrapy.Item):
    title = scrapy.Field()
    link = scrapy.Field()
    desc = scrapy.Field()

I get this error:

File "/Users/Admin/scpy_projs/tutorial/tutorial/spiders/dmoz_spider.py", line 23, in parse
    increment()
NameError: global name 'increment' is not defined

Why I can't call increment() from within parse(self, response)? How can I make this work?

Thanks for any help.

like image 436
ryan71 Avatar asked Nov 13 '15 18:11

ryan71


1 Answers

increment() is an instance method of your spider - use self.increment() to call it.

Also, there is no need for using globals - define count() as an instance variable.

Fixed version:

import scrapy

from tutorial.items import DmozItem

class DmozSpider(scrapy.Spider):
    name = "dmoz"
    allowed_domains = ["lib-web.org"]
    start_urls = [
        "http://www.lib-web.org/united-states/public-libraries/michigan/"
    ]

    def __init__(self,  *args, **kwargs):
        super(DmozSpider, self).__init__(*args, **kwargs)

        self.count = 0

    def increment(self):
        self.count += 1

    def getCount(self):
        return self.count

    def parse(self, response):
        self.increment()

        for sel in response.xpath('//div/div/div/ul/li'):
            item = DmozItem()
            item['title'] = sel.xpath('a/text()').extract()
            item['link'] = sel.xpath('a/@href').extract()
            item['desc'] = sel.xpath('p/text()').extract()
            x = self.getCount()
            print x

            yield item

You can also define count as a property.

like image 93
alecxe Avatar answered Oct 11 '22 01:10

alecxe