What the difference is between flush()
and commit()
in SQLAlchemy?
I've read the docs, but am none the wiser - they seem to assume a pre-understanding that I don't have.
I'm particularly interested in their impact on memory usage. I'm loading some data into a database from a series of files (around 5 million rows in total) and my session is occasionally falling over - it's a large database and a machine with not much memory.
I'm wondering if I'm using too many commit()
and not enough flush()
calls - but without really understanding what the difference is, it's hard to tell!
commit() commits (persists) those changes to the database. flush() is always called as part of a call to commit() (1). When you use a Session object to query the database, the query will return results both from the database and from the flushed parts of the uncommitted transaction it holds.
flush() will synchronize your database with the current state of object/objects held in the memory but it does not commit the transaction. So, if you get any exception after flush() is called, then the transaction will be rolled back.
commit() is used to commit the current transaction. It always issues flush() beforehand to flush any remaining state to the database; this is independent of the “autoflush” setting. If no transaction is present, it raises an error.
flush(): Forces the session to flush. It is used to synchronize session data with database. When you call session. flush(), the statements are executed in database but it will not committed. If you don't call session.
This does not strictly answer the original question but some people have mentioned that with session.autoflush = True
you don't have to use session.flush()
... And this is not always true.
If you want to use the id of a newly created object in the middle of a transaction, you must call session.flush()
.
# Given a model with at least this id class AModel(Base): id = Column(Integer, primary_key=True) # autoincrement by default on integer primary key session.autoflush = True a = AModel() session.add(a) a.id # None session.flush() a.id # autoincremented integer
This is because autoflush
does NOT auto fill the id (although a query of the object will, which sometimes can cause confusion as in "why this works here but not there?" But snapshoe already covered this part).
One related aspect that seems pretty important to me and wasn't really mentioned:
Why would you not commit all the time? - The answer is atomicity.
A fancy word to say: an ensemble of operations have to all be executed successfully OR none of them will take effect.
For example, if you want to create/update/delete some object (A) and then create/update/delete another (B), but if (B) fails you want to revert (A). This means those 2 operations are atomic.
Therefore, if (B) needs a result of (A), you want to call flush
after (A) and commit
after (B).
Also, if session.autoflush is True
, except for the case that I mentioned above or others in Jimbo's answer, you will not need to call flush
manually.
A Session object is basically an ongoing transaction of changes to a database (update, insert, delete). These operations aren't persisted to the database until they are committed (if your program aborts for some reason in mid-session transaction, any uncommitted changes within are lost).
The session object registers transaction operations with session.add()
, but doesn't yet communicate them to the database until session.flush()
is called.
session.flush()
communicates a series of operations to the database (insert, update, delete). The database maintains them as pending operations in a transaction. The changes aren't persisted permanently to disk, or visible to other transactions until the database receives a COMMIT for the current transaction (which is what session.commit()
does).
session.commit()
commits (persists) those changes to the database.
flush()
is always called as part of a call to commit()
(1).
When you use a Session object to query the database, the query will return results both from the database and from the flushed parts of the uncommitted transaction it holds. By default, Session objects autoflush
their operations, but this can be disabled.
Hopefully this example will make this clearer:
#--- s = Session() s.add(Foo('A')) # The Foo('A') object has been added to the session. # It has not been committed to the database yet, # but is returned as part of a query. print 1, s.query(Foo).all() s.commit() #--- s2 = Session() s2.autoflush = False s2.add(Foo('B')) print 2, s2.query(Foo).all() # The Foo('B') object is *not* returned # as part of this query because it hasn't # been flushed yet. s2.flush() # Now, Foo('B') is in the same state as # Foo('A') was above. print 3, s2.query(Foo).all() s2.rollback() # Foo('B') has not been committed, and rolling # back the session's transaction removes it # from the session. print 4, s2.query(Foo).all() #--- Output: 1 [<Foo('A')>] 2 [<Foo('A')>] 3 [<Foo('A')>, <Foo('B')>] 4 [<Foo('A')>]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With