After scraping some data with a Scrapy spider:
class Test_Spider(Spider):
name = "test"
def start_requests(self):
for i in range(900,902,1):
........
yield item
I pass the data to a pipeline object to be written to an SQLite table using SQLAlchemy::
class SQLlitePipeline(object):
def __init__(self):
_engine = create_engine("sqlite:///data.db")
_connection = _engine.connect()
_metadata = MetaData()
_stack_items = Table("table1", _metadata,
Column("id", Integer, primary_key=True),
Column("detail_url", Text),
_metadata.create_all(_engine)
self.connection = _connection
self.stack_items = _stack_items
def process_item(self, item, spider):
is_valid = True
I'd like to be able to set the table name as a variable instead of being hardcoded in as it is now "table1"
. How can this be done?
Assuming you pass this parameter through the command line (e.g. -s table="table1"
), define a from_crawler
method.
@classmethod
def from_crawler(cls, crawler):
# Here, you get whatever value was passed through the "table" parameter
settings = crawler.settings
table = settings.get('table')
# Instantiate the pipeline with your table
return cls(table)
def __init__(self, table):
_engine = create_engine("sqlite:///data.db")
_connection = _engine.connect()
_metadata = MetaData()
_stack_items = Table(table, _metadata,
Column("id", Integer, primary_key=True),
Column("detail_url", Text),
_metadata.create_all(_engine)
self.connection = _connection
self.stack_items = _stack_items
A simpler way to do this is to pass the argument on crawl
:
scrapy crawl -a table=table1
Then get the value with spider.table
:
class TestScrapyPipeline(object):
def process_item(self, item, spider):
table = spider.table
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With