I'm working with sqlalchemy 1.1 and scrapy. I'm currently using a a pipeline to store extracted data in a sqllite table via sqlalchemy . I'd like to dynamically create a table to accommodate the item being scraped.
My static pipeline element looks like:
class SQLlitePipeline(object):
def __init__(self):
db_path = "sqlite:///"+settings.SETTINGS_PATH+"\\data.db"
_engine = create_engine(db_path)
_connection = _engine.connect()
_metadata = MetaData()
_stack_items = Table(table_name, _metadata,
Column("id", Integer, primary_key=True),
Column("value", Text))
Column("value2", Text))
_metadata.create_all(_engine)
self.connection = _connection
self.stack_items = _stack_items
def process_item(self, item, spider):
try:
ins_query = self.stack_items.insert().values(
value=item['value'],
value2=item['value2'],)
self.connection.execute(ins_query)
except IntegrityError:
print('THIS IS A DUP')
return item
items.py:
class Filtered_Item(scrapy.Item):
value= scrapy.Field()
value2= scrapy.Field()
How can I modify the pipeline above to dynamically create and insert the filtered item's values instead of having these hard coded in like they are now?
There's actually a package out there that can help you out with this.
Check out: dataset: databases for lazy people
Here's an excerpt from the page:
Features
Automatic schema:
If a table or column is written that does not exist in the database, it will be created automatically.
Upserts:
Records are either created or updated, depending on whether an existing version can be found. Query helpers for simple queries such as all rows in a table or all distinct values across a set of columns.
Compatibility:
Being built on top of SQLAlchemy, dataset works with all major databases, such as SQLite, PostgreSQL and MySQL.
Scripted exports:
Data can be exported based on a scripted configuration, making the process easy and replicable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With