Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bulk inserts with Flask-SQLAlchemy

I'm using Flask-SQLAlchemy to do a rather large bulk insert of 60k rows. I also have a many-to-many relationship on this table, so I can't use db.engine.execute for this. Before inserting, I need to find similar items in the database, and change the insert to an update if a duplicate item is found.

I could do this check beforehand, and then do a bulk insert via db.engine.execute, but I need the primary key of the row upon insertion.

Currently, I am doing a db.session.add() and db.session.commit() on each insert, and I get a measly 3-4 inserts per second.

I ran a profiler to see where the bottleneck is, and it seems that the db.session.commit() is taking 60% of the time.

Is there some way that would allow me to make this operation faster, perhaps by grouping commits, but which would give me primary keys back?

This is what my models looks like:

class Item(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    title = db.Column(db.String(1024), nullable=True)
    created = db.Column(db.DateTime())
    tags_relationship = db.relationship('Tag', secondary=tags, backref=db.backref('items', lazy='dynamic'))
    tags = association_proxy('tags_relationship', 'text')

class Tag(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    text = db.Column(db.String(255))

My insert operation is:

for item in items:
    if duplicate:
        update_existing_item
    else:
        x = Item()
        x.title = "string"
        x.created = datetime.datetime.utcnow()
        for tag in tags:
            if not tag_already_exists:
                y = Tag()
                y.text = "tagtext"
                x.tags_relationship.append(y)
                db.session.add(y)
                db.session.commit()
            else:
                x.tags_relationship.append(existing_tag)
    db.session.add(x)
    db.session.commit()
like image 638
Nik Barres Avatar asked May 19 '15 04:05

Nik Barres


2 Answers

Perhaps you should try to db.session.flush() to send the data to the server, which means any primary keys will be generated. At the end you can db.session.commit() to actually commit the transaction.

like image 135
wouter bolsterlee Avatar answered Nov 03 '22 23:11

wouter bolsterlee


I use the following code to quickly read the content of a pandas DataFrame into SQLite. Note that it circumvents the ORM features of SQLAlchemy. myClass in this context is a db.Model derived class that has a tablename assigned to it. As the code snippets mentions, I adapted

l = df.to_dict('records')
# bulk save the dictionaries, circumventing the slow ORM interface
# c.f. https://gist.github.com/shrayasr/5df96d5bc287f3a2faa4
connection.engine.execute(
    myClass.__table__.insert(),
    l
)
like image 40
Henning Avatar answered Nov 04 '22 00:11

Henning