Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading settings in spider scrapy

Tags:

python

scrapy

I wrote a small scrapy spider. Following is my code

class ElectronicsSpider(scrapy.Spider):
    name = "electronics"
    allowed_domains = ["www.olx.com"]
    start_urls = ['http://www.olx.com/']

    def parse(self, response):
        pass

My question is, I want to read the name,allowed_domains and start_urls using setting. How can i do this?

I tried importing

 from scrapy.settings import Settings

also tried this

 def __init__(self,crawler):
        self.settings = crawler.settings

but I got none/error. Help me to read settings in my spider?

like image 282
Vigneshwaran Avatar asked Jul 21 '17 06:07

Vigneshwaran


People also ask

What does parse function do in Scrapy?

The parse method is in charge of processing the response and returning scraped data and/or more URLs to follow. Other Requests callbacks have the same requirements as the Spider class. This method, as well as any other Request callback, must return an iterable of Request and/or item objects.

How do you set a header in Scrapy?

You need to set the user agent which Scrapy allows you to do directly. import scrapy class QuotesSpider(scrapy. Spider): # ... user_agent = 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.

What is download delay in Scrapy?

Example: DOWNLOAD_DELAY = 0.25 # 250 ms of delay. This setting is also affected by the RANDOMIZE_DOWNLOAD_DELAY setting (which is enabled by default). By default, Scrapy doesn't wait a fixed amount of time between requests, but uses a random interval between 0.5 * DOWNLOAD_DELAY and 1.5 * DOWNLOAD_DELAY .

How do you stop a spider from being Scrapy?

In the latest version of Scrapy, available on GitHub, you can raise a CloseSpider exception to manually close a spider. It succeeds to force stop, but not fast enough. It still lets some Request running.


2 Answers

self.settings is not yet initiated in __init__(). You can check self.settings in start_requests().

def start_requests(self): 
    print self.settings
like image 86
Aminah Nuraini Avatar answered Sep 21 '22 15:09

Aminah Nuraini


from scrapy.utils.project import get_project_settings

settings=get_project_settings()
print settings.get('NAME')

Using this code we can read data from settings file...

like image 34
Sellamani Avatar answered Sep 21 '22 15:09

Sellamani