Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQLAlchemy 'bulk_save_objects' vs 'add_all' underlying logic difference?

Considering following three methods of using sqlalchemy ORM to insert objects:

(1)

for obj in objects:
    session.add(obj)

(2)

session.add_all(objects)

(3)

session.bulk_save_objects(objects)

Suppose the length of objects[] is 50000

  • Does method (1) form and send 50000 insert SQL queries?
  • Does method (2) form and send only 1 SQL query?
  • Does method (3) form and send only 1 SQL query?

I know these three methods differ a lot in speed. But what are the difference regarding to the underlying implementation details?

like image 256
AnnieFromTaiwan Avatar asked Oct 23 '19 07:10

AnnieFromTaiwan


People also ask

What does Autoflush do in SQLAlchemy?

autoflush – The autoflush setting to use with newly created Session objects. autocommit – The autocommit setting to use with newly created Session objects.

What is Sessionmaker in SQLAlchemy?

Advertisements. In order to interact with the database, we need to obtain its handle. A session object is the handle to database. Session class is defined using sessionmaker() – a configurable session factory method which is bound to the engine object created earlier.

What is DB Session rollback?

session. rollback() to discard your changes. Pushing object changes to the database means your database now holds the changes in its transaction buffer.

Is SQLAlchemy slower?

SQLAlchemy is very, very fast. It's just that users tend to be unaware of just how much functionality is being delivered, and confuse an ORM result set with that of a raw database cursor.


1 Answers

(2) is basically implemented as (1), and both may emit 50,000 inserts during flush, if the ORM has to fetch generated values such as primary keys. They may even emit more, if those 50,000 objects have relationships that cascade.

In [4]: session.add_all([Foo() for _ in range(5)])  In [5]: session.commit() BEGIN (implicit) INSERT INTO foo DEFAULT VALUES RETURNING foo.id {} ... (repeats 3 times) INSERT INTO foo DEFAULT VALUES RETURNING foo.id {} COMMIT 

If you provide primary keys and other DB generated values beforehand, then the Session can combine separate inserts to a single "executemany" operation when the arguments match.

In [8]: session.add_all([Foo(id=i) for i in range(5)])  In [9]: session.commit() BEGIN (implicit) INSERT INTO foo (id) VALUES (%(id)s) ({'id': 0}, {'id': 1}, {'id': 2}, {'id': 3}, {'id': 4}) COMMIT 

If your DB-API driver implements executemany() or equivalent using a method that allows it to issue a single statement with multiple data, then it can result in a single query. For example after enabling executemany_mode='values' the Postgresql log contains for the above

LOG: statement: INSERT INTO foo (id) VALUES (0),(1),(2),(3),(4) 

The bulk operation skips most of the Session machinery — such as persisting related objects — in exchange for performance gains. For example by default it does not fetch default values, such as primary keys, which allows it to try and batch changes to fewer "executemany" operations where the operation and arguments match.

In [12]: session.bulk_save_objects([Foo() for _ in range(5)]) BEGIN (implicit) INSERT INTO foo DEFAULT VALUES ({}, {}, {}, {}, {})  In [13]: session.commit() COMMIT 

It may still emit multiple statements, again depending on the data, and the DB-API driver in use. The documentation is a good read.

With psycopg2 fast execution helpers enabled the above produces in the Postgresql log

LOG: statement: INSERT INTO foo DEFAULT VALUES;INSERT INTO foo DEFAULT VALUES;INSERT INTO foo DEFAULT VALUES;INSERT INTO foo DEFAULT VALUES;INSERT INTO foo DEFAULT VALUES 

In other words multiple statements have been joined to a "single" statement sent to the server.

So, in the end the answer to all 3 is "it depends", which of course may seem frustrating.

like image 87
Ilja Everilä Avatar answered Sep 24 '22 02:09

Ilja Everilä