I wrote a small scrapy spider. Following is my code
class ElectronicsSpider(scrapy.Spider):
name = "electronics"
allowed_domains = ["www.olx.com"]
start_urls = ['http://www.olx.com/']
def parse(self, response):
pass
My question is, I want to read the name,allowed_domains and start_urls using setting. How can i do this?
I tried importing
from scrapy.settings import Settings
also tried this
def __init__(self,crawler):
self.settings = crawler.settings
but I got none/error. Help me to read settings in my spider?
The parse method is in charge of processing the response and returning scraped data and/or more URLs to follow. Other Requests callbacks have the same requirements as the Spider class. This method, as well as any other Request callback, must return an iterable of Request and/or item objects.
You need to set the user agent which Scrapy allows you to do directly. import scrapy class QuotesSpider(scrapy. Spider): # ... user_agent = 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.
Example: DOWNLOAD_DELAY = 0.25 # 250 ms of delay. This setting is also affected by the RANDOMIZE_DOWNLOAD_DELAY setting (which is enabled by default). By default, Scrapy doesn't wait a fixed amount of time between requests, but uses a random interval between 0.5 * DOWNLOAD_DELAY and 1.5 * DOWNLOAD_DELAY .
In the latest version of Scrapy, available on GitHub, you can raise a CloseSpider exception to manually close a spider. It succeeds to force stop, but not fast enough. It still lets some Request running.
self.settings
is not yet initiated in __init__()
. You can check self.settings
in start_requests()
.
def start_requests(self):
print self.settings
from scrapy.utils.project import get_project_settings
settings=get_project_settings()
print settings.get('NAME')
Using this code we can read data from settings file...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With