Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to insert multiple items into database when using Scrapy?

Nowadays most databases support inserting multiple records into database in one run. That is much faster than inserting records one by one, because only one transaction is need. The SQL syntax is similar to this:

INSERT INTO tbl_name (a,b,c)
VALUES(1,2,3), (4,5,6), (7,8,9);

Right now I'm using Python Scrapy on a small project. I use its item pipeline to store scraped data into a database. However, the logic behind item pipeline is that the relevant method will be called on each item. So it will always insert a single item at a time. How can I collect like 100 items and insert them in one run?

like image 408
Just a learner Avatar asked May 23 '26 10:05

Just a learner


1 Answers

You need to refactor your pipeline into something like this:

class DatabasePipeline(object):

    def open_spider(self, spider):
        #Create database connection
        ...
        #create items list
        self.items = []

    def process_item(self,item,spider):
        self.items.append(item)
        if len(self.items)==100:
            #constuct SQL query to insert multiple records
            ...
            #execute query and clean self.items
            self.items = []
        return item

    def close_spider(self,spider):
        #insert remaining records
        if self.items:
            #constuct SQL query to insert multiple records
            ...
            #execute query
        #close database connection
like image 150
Georgiy Avatar answered May 25 '26 22:05

Georgiy



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!