Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to pass parameter to a scrapy pipeline object

After scraping some data with a Scrapy spider:

class Test_Spider(Spider):

    name = "test"
    def start_requests(self):
        for i in range(900,902,1):
            ........
            yield item

I pass the data to a pipeline object to be written to an SQLite table using SQLAlchemy::

class SQLlitePipeline(object):

    def __init__(self):
        _engine = create_engine("sqlite:///data.db")
        _connection = _engine.connect()
        _metadata = MetaData()
        _stack_items = Table("table1", _metadata,
                             Column("id", Integer, primary_key=True),
                             Column("detail_url", Text),
        _metadata.create_all(_engine)
        self.connection = _connection
        self.stack_items = _stack_items

    def process_item(self, item, spider):
        is_valid = True

I'd like to be able to set the table name as a variable instead of being hardcoded in as it is now "table1". How can this be done?

like image 363
user1592380 Avatar asked Dec 08 '16 16:12

user1592380


2 Answers

Assuming you pass this parameter through the command line (e.g. -s table="table1"), define a from_crawler method.

@classmethod
def from_crawler(cls, crawler):
    # Here, you get whatever value was passed through the "table" parameter
    settings = crawler.settings
    table = settings.get('table')

    # Instantiate the pipeline with your table
    return cls(table)

def __init__(self, table):
    _engine = create_engine("sqlite:///data.db")
    _connection = _engine.connect()
    _metadata = MetaData()
    _stack_items = Table(table, _metadata,
                         Column("id", Integer, primary_key=True),
                         Column("detail_url", Text),
    _metadata.create_all(_engine)
    self.connection = _connection
    self.stack_items = _stack_items
like image 116
lucasnadalutti Avatar answered Nov 14 '22 04:11

lucasnadalutti


A simpler way to do this is to pass the argument on crawl:

scrapy crawl -a table=table1

Then get the value with spider.table:

class TestScrapyPipeline(object):
    def process_item(self, item, spider):
        table = spider.table
like image 44
daaawx Avatar answered Nov 14 '22 05:11

daaawx