Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQLAlchemy: What's the difference between flush() and commit()?

What the difference is between flush() and commit() in SQLAlchemy?

I've read the docs, but am none the wiser - they seem to assume a pre-understanding that I don't have.

I'm particularly interested in their impact on memory usage. I'm loading some data into a database from a series of files (around 5 million rows in total) and my session is occasionally falling over - it's a large database and a machine with not much memory.

I'm wondering if I'm using too many commit() and not enough flush() calls - but without really understanding what the difference is, it's hard to tell!

like image 598
AP257 Avatar asked Nov 17 '10 04:11

AP257


People also ask

What is the difference between commit and flush?

commit() commits (persists) those changes to the database. flush() is always called as part of a call to commit() (1). When you use a Session object to query the database, the query will return results both from the database and from the flushed parts of the uncommitted transaction it holds.

Does Session flush commit transaction?

flush() will synchronize your database with the current state of object/objects held in the memory but it does not commit the transaction. So, if you get any exception after flush() is called, then the transaction will be rolled back.

What is Session commit ()?

commit() is used to commit the current transaction. It always issues flush() beforehand to flush any remaining state to the database; this is independent of the “autoflush” setting. If no transaction is present, it raises an error.

What is Session flush ()?

flush(): Forces the session to flush. It is used to synchronize session data with database. When you call session. flush(), the statements are executed in database but it will not committed. If you don't call session.


2 Answers

This does not strictly answer the original question but some people have mentioned that with session.autoflush = True you don't have to use session.flush()... And this is not always true.

If you want to use the id of a newly created object in the middle of a transaction, you must call session.flush().

# Given a model with at least this id class AModel(Base):    id = Column(Integer, primary_key=True)  # autoincrement by default on integer primary key  session.autoflush = True  a = AModel() session.add(a) a.id  # None session.flush() a.id  # autoincremented integer 

This is because autoflush does NOT auto fill the id (although a query of the object will, which sometimes can cause confusion as in "why this works here but not there?" But snapshoe already covered this part).


One related aspect that seems pretty important to me and wasn't really mentioned:

Why would you not commit all the time? - The answer is atomicity.

A fancy word to say: an ensemble of operations have to all be executed successfully OR none of them will take effect.

For example, if you want to create/update/delete some object (A) and then create/update/delete another (B), but if (B) fails you want to revert (A). This means those 2 operations are atomic.

Therefore, if (B) needs a result of (A), you want to call flush after (A) and commit after (B).

Also, if session.autoflush is True, except for the case that I mentioned above or others in Jimbo's answer, you will not need to call flush manually.

like image 24
Romain Vincent Avatar answered Oct 10 '22 05:10

Romain Vincent


A Session object is basically an ongoing transaction of changes to a database (update, insert, delete). These operations aren't persisted to the database until they are committed (if your program aborts for some reason in mid-session transaction, any uncommitted changes within are lost).

The session object registers transaction operations with session.add(), but doesn't yet communicate them to the database until session.flush() is called.

session.flush() communicates a series of operations to the database (insert, update, delete). The database maintains them as pending operations in a transaction. The changes aren't persisted permanently to disk, or visible to other transactions until the database receives a COMMIT for the current transaction (which is what session.commit() does).

session.commit() commits (persists) those changes to the database.

flush() is always called as part of a call to commit() (1).

When you use a Session object to query the database, the query will return results both from the database and from the flushed parts of the uncommitted transaction it holds. By default, Session objects autoflush their operations, but this can be disabled.

Hopefully this example will make this clearer:

#--- s = Session()  s.add(Foo('A')) # The Foo('A') object has been added to the session.                 # It has not been committed to the database yet,                 #   but is returned as part of a query. print 1, s.query(Foo).all() s.commit()  #--- s2 = Session() s2.autoflush = False  s2.add(Foo('B')) print 2, s2.query(Foo).all() # The Foo('B') object is *not* returned                              #   as part of this query because it hasn't                              #   been flushed yet. s2.flush()                   # Now, Foo('B') is in the same state as                              #   Foo('A') was above. print 3, s2.query(Foo).all()  s2.rollback()                # Foo('B') has not been committed, and rolling                              #   back the session's transaction removes it                              #   from the session. print 4, s2.query(Foo).all()  #--- Output: 1 [<Foo('A')>] 2 [<Foo('A')>] 3 [<Foo('A')>, <Foo('B')>] 4 [<Foo('A')>] 
like image 87
snapshoe Avatar answered Oct 10 '22 06:10

snapshoe