Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sqlalchemy : Dynamically create table from Scrapy item

I'm working with sqlalchemy 1.1 and scrapy. I'm currently using a a pipeline to store extracted data in a sqllite table via sqlalchemy . I'd like to dynamically create a table to accommodate the item being scraped.

My static pipeline element looks like:

class SQLlitePipeline(object):

    def __init__(self):
        db_path = "sqlite:///"+settings.SETTINGS_PATH+"\\data.db"
        _engine = create_engine(db_path)
        _connection = _engine.connect()
        _metadata = MetaData()
        _stack_items = Table(table_name, _metadata,
                             Column("id", Integer, primary_key=True),
                             Column("value", Text))
                             Column("value2", Text))
        _metadata.create_all(_engine)
        self.connection = _connection
        self.stack_items = _stack_items

    def process_item(self, item, spider):

            try:
                ins_query = self.stack_items.insert().values(
                value=item['value'],
                value2=item['value2'],)
                self.connection.execute(ins_query)
            except IntegrityError:
                    print('THIS IS A DUP')
            return item

items.py:

class Filtered_Item(scrapy.Item):

    value= scrapy.Field()
    value2= scrapy.Field()

How can I modify the pipeline above to dynamically create and insert the filtered item's values instead of having these hard coded in like they are now?

like image 738
user1592380 Avatar asked Dec 12 '16 17:12

user1592380


1 Answers

There's actually a package out there that can help you out with this.

Check out: dataset: databases for lazy people

Here's an excerpt from the page:

Features

Automatic schema:

If a table or column is written that does not exist in the database, it will be created automatically.

Upserts:

Records are either created or updated, depending on whether an existing version can be found. Query helpers for simple queries such as all rows in a table or all distinct values across a set of columns.

Compatibility:

Being built on top of SQLAlchemy, dataset works with all major databases, such as SQLite, PostgreSQL and MySQL.

Scripted exports:

Data can be exported based on a scripted configuration, making the process easy and replicable.

like image 101
Alex Luis Arias Avatar answered Sep 20 '22 15:09

Alex Luis Arias