Considering following three methods of using sqlalchemy ORM to insert objects: (1) <pre class="prettyprint"><code>for obj in objects: session.add(obj) </code></pre> (2) <pre class="prettyprint"><code>session.add_all(objects) </code></pre> (3) <pre class="prettyprint"><code>session.bulk_save_objects(objects) </code></pre> Suppose the length of <code>objects[]</code> is <code>50000</code> <ul> <li>Does method (1) form and send <code>50000</code> insert SQL queries?</li> <li>Does method (2) form and send only <code>1</code> SQL query?</li> <li>Does method (3) form and send only <code>1</code> SQL query?</li> </ul> I know these three methods differ a lot in speed. But what are the difference regarding to the underlying implementation details?

(2) is basically implemented as (1), and both may emit 50,000 inserts during flush, if the ORM has to fetch generated values such as primary keys. They may even emit more, if those 50,000 objects have relationships that cascade. <pre class="prettyprint"><code>In [4]: session.add_all([Foo() for _ in range(5)]) In [5]: session.commit() BEGIN (implicit) INSERT INTO foo DEFAULT VALUES RETURNING foo.id {} ... (repeats 3 times) INSERT INTO foo DEFAULT VALUES RETURNING foo.id {} COMMIT </code></pre> If you provide primary keys and other DB generated values beforehand, then the <code>Session</code> can combine separate inserts to a single "executemany" operation when the arguments match. <pre class="prettyprint"><code>In [8]: session.add_all([Foo(id=i) for i in range(5)]) In [9]: session.commit() BEGIN (implicit) INSERT INTO foo (id) VALUES (%(id)s) ({'id': 0}, {'id': 1}, {'id': 2}, {'id': 3}, {'id': 4}) COMMIT </code></pre> If your DB-API driver implements <code>executemany()</code> or equivalent using a method that allows it to issue a single statement with multiple data, then it can result in a single query. For example after enabling <code>executemany_mode='values'</code> the Postgresql log contains for the above <pre class="prettyprint"><code>LOG: statement: INSERT INTO foo (id) VALUES (0),(1),(2),(3),(4) </code></pre> The bulk operation skips most of the <code>Session</code> machinery — such as persisting related objects — in exchange for performance gains. For example by default it does not fetch default values, such as primary keys, which allows it to try and batch changes to fewer "executemany" operations where the operation and arguments match. <pre class="prettyprint"><code>In [12]: session.bulk_save_objects([Foo() for _ in range(5)]) BEGIN (implicit) INSERT INTO foo DEFAULT VALUES ({}, {}, {}, {}, {}) In [13]: session.commit() COMMIT </code></pre> It may still emit multiple statements, again depending on the data, and the DB-API driver in use. The documentation is a good read. With psycopg2 fast execution helpers enabled the above produces in the Postgresql log <pre class="prettyprint"><code>LOG: statement: INSERT INTO foo DEFAULT VALUES;INSERT INTO foo DEFAULT VALUES;INSERT INTO foo DEFAULT VALUES;INSERT INTO foo DEFAULT VALUES;INSERT INTO foo DEFAULT VALUES </code></pre> In other words multiple statements have been joined to a "single" statement sent to the server. So, in the end the answer to all 3 is "it depends", which of course may seem frustrating.

SQLAlchemy 'bulk_save_objects' vs 'add_all' underlying logic difference?

Tags:

python

sqlalchemy

Considering following three methods of using sqlalchemy ORM to insert objects:

(1)

for obj in objects:
    session.add(obj)

(2)

session.add_all(objects)

(3)

session.bulk_save_objects(objects)

Suppose the length of objects[] is 50000

Does method (1) form and send 50000 insert SQL queries?
Does method (2) form and send only 1 SQL query?
Does method (3) form and send only 1 SQL query?

I know these three methods differ a lot in speed. But what are the difference regarding to the underlying implementation details?

256

asked Oct 23 '19 07:10

AnnieFromTaiwan

1 Answers

(2) is basically implemented as (1), and both may emit 50,000 inserts during flush, if the ORM has to fetch generated values such as primary keys. They may even emit more, if those 50,000 objects have relationships that cascade.

In [4]: session.add_all([Foo() for _ in range(5)])  In [5]: session.commit() BEGIN (implicit) INSERT INTO foo DEFAULT VALUES RETURNING foo.id {} ... (repeats 3 times) INSERT INTO foo DEFAULT VALUES RETURNING foo.id {} COMMIT

If you provide primary keys and other DB generated values beforehand, then the Session can combine separate inserts to a single "executemany" operation when the arguments match.

In [8]: session.add_all([Foo(id=i) for i in range(5)])  In [9]: session.commit() BEGIN (implicit) INSERT INTO foo (id) VALUES (%(id)s) ({'id': 0}, {'id': 1}, {'id': 2}, {'id': 3}, {'id': 4}) COMMIT

If your DB-API driver implements executemany() or equivalent using a method that allows it to issue a single statement with multiple data, then it can result in a single query. For example after enabling executemany_mode='values' the Postgresql log contains for the above

LOG: statement: INSERT INTO foo (id) VALUES (0),(1),(2),(3),(4)

The bulk operation skips most of the Session machinery — such as persisting related objects — in exchange for performance gains. For example by default it does not fetch default values, such as primary keys, which allows it to try and batch changes to fewer "executemany" operations where the operation and arguments match.

In [12]: session.bulk_save_objects([Foo() for _ in range(5)]) BEGIN (implicit) INSERT INTO foo DEFAULT VALUES ({}, {}, {}, {}, {})  In [13]: session.commit() COMMIT

It may still emit multiple statements, again depending on the data, and the DB-API driver in use. The documentation is a good read.

With psycopg2 fast execution helpers enabled the above produces in the Postgresql log

LOG: statement: INSERT INTO foo DEFAULT VALUES;INSERT INTO foo DEFAULT VALUES;INSERT INTO foo DEFAULT VALUES;INSERT INTO foo DEFAULT VALUES;INSERT INTO foo DEFAULT VALUES

In other words multiple statements have been joined to a "single" statement sent to the server.

So, in the end the answer to all 3 is "it depends", which of course may seem frustrating.

answered Sep 24 '22 02:09

Ilja Everilä

Related questions
                            
                                how to add an xml node based on the value of a text node
                            
                                How do I stop a Python process instantly from a Tkinter window?
                            
                                Why cant select.select() catch closed socket error?
                            
                                importing without executing the class - python
                            
                                Python decorator to add class-level variables
                            
                                passing object in rpyc fails
                            
                                Use NLTK without installing [closed]
                            
                                scrapyd deploy shows 0 spiders
                            
                                Refreshing detached entity in sqlalchemy
                            
                                How to replace&add the dataframe element by another dataframe in Python Pandas?
                            
                                Installing matplotlib on Mac OSX Mountain Lion
                            
                                How to make an external database query iterable?
                            
                                How to copy a table from excel to word using pythonCOM
                            
                                Returning a Structure using ctypes in Python
                            
                                Why should PyImport_AppendInittab() be called before Py_Initialize()?
                            
                                How to mock a Python class that is two imports deep?
                            
                                How can I query rows with unique values on a joined column?
                            
                                creating inset in matplot lib
                            
                                Changing the “locale preferred encoding” in Python 3 in Windows
                            
                                " RuntimeError: thread.__init__() not called" when subclassing threading.Thread

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With