SQLAlchemy: What's the difference between flush() and commit()?

Tags:

sqlalchemy

What the difference is between flush() and commit() in SQLAlchemy?

I've read the docs, but am none the wiser - they seem to assume a pre-understanding that I don't have.

I'm particularly interested in their impact on memory usage. I'm loading some data into a database from a series of files (around 5 million rows in total) and my session is occasionally falling over - it's a large database and a machine with not much memory.

I'm wondering if I'm using too many commit() and not enough flush() calls - but without really understanding what the difference is, it's hard to tell!

598

asked Nov 17 '10 04:11

AP257

2 Answers

This does not strictly answer the original question but some people have mentioned that with session.autoflush = True you don't have to use session.flush()... And this is not always true.

If you want to use the id of a newly created object in the middle of a transaction, you must call session.flush().

# Given a model with at least this id class AModel(Base):    id = Column(Integer, primary_key=True)  # autoincrement by default on integer primary key  session.autoflush = True  a = AModel() session.add(a) a.id  # None session.flush() a.id  # autoincremented integer

This is because autoflush does NOT auto fill the id (although a query of the object will, which sometimes can cause confusion as in "why this works here but not there?" But snapshoe already covered this part).

One related aspect that seems pretty important to me and wasn't really mentioned:

Why would you not commit all the time? - The answer is atomicity.

A fancy word to say: an ensemble of operations have to all be executed successfully OR none of them will take effect.

For example, if you want to create/update/delete some object (A) and then create/update/delete another (B), but if (B) fails you want to revert (A). This means those 2 operations are atomic.

Therefore, if (B) needs a result of (A), you want to call flush after (A) and commit after (B).

Also, if session.autoflush is True, except for the case that I mentioned above or others in Jimbo's answer, you will not need to call flush manually.

answered Oct 10 '22 05:10

Romain Vincent

A Session object is basically an ongoing transaction of changes to a database (update, insert, delete). These operations aren't persisted to the database until they are committed (if your program aborts for some reason in mid-session transaction, any uncommitted changes within are lost).

The session object registers transaction operations with session.add(), but doesn't yet communicate them to the database until session.flush() is called.

session.flush() communicates a series of operations to the database (insert, update, delete). The database maintains them as pending operations in a transaction. The changes aren't persisted permanently to disk, or visible to other transactions until the database receives a COMMIT for the current transaction (which is what session.commit() does).

session.commit() commits (persists) those changes to the database.

flush() is always called as part of a call to commit() (1).

When you use a Session object to query the database, the query will return results both from the database and from the flushed parts of the uncommitted transaction it holds. By default, Session objects autoflush their operations, but this can be disabled.

Hopefully this example will make this clearer:

#--- s = Session()  s.add(Foo('A')) # The Foo('A') object has been added to the session.                 # It has not been committed to the database yet,                 #   but is returned as part of a query. print 1, s.query(Foo).all() s.commit()  #--- s2 = Session() s2.autoflush = False  s2.add(Foo('B')) print 2, s2.query(Foo).all() # The Foo('B') object is *not* returned                              #   as part of this query because it hasn't                              #   been flushed yet. s2.flush()                   # Now, Foo('B') is in the same state as                              #   Foo('A') was above. print 3, s2.query(Foo).all()  s2.rollback()                # Foo('B') has not been committed, and rolling                              #   back the session's transaction removes it                              #   from the session. print 4, s2.query(Foo).all()  #--- Output: 1 [<Foo('A')>] 2 [<Foo('A')>] 3 [<Foo('A')>, <Foo('B')>] 4 [<Foo('A')>]

answered Oct 10 '22 06:10

snapshoe

Related questions
                            
                                How can I get the source code of a Python function?
                            
                                Get exception description and stack trace which caused an exception, all as a string
                            
                                Filter dataframe rows if value in column is in a set list of values [duplicate]
                            
                                How can I use pickle to save a dict?
                            
                                How to delete a character from a string using Python
                            
                                What are the advantages of NumPy over regular Python lists?
                            
                                Get name of current script in Python
                            
                                How to count the occurrence of certain item in an ndarray?
                            
                                How to rename a file using Python
                            
                                Text Progress Bar in the Console [closed]
                            
                                Convert a Unicode string to a string in Python (containing extra symbols)
                            
                                How to comment out a block of code in Python [duplicate]
                            
                                Import error: No module name urllib2
                            
                                Proper way to use **kwargs in Python
                            
                                How do I correctly clean up a Python object?
                            
                                Error: " 'dict' object has no attribute 'iteritems' "
                            
                                Python: Checking if a 'Dictionary' is empty doesn't seem to work
                            
                                How to "test" NoneType in python?
                            
                                pip install fails with "connection error: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:598)"
                            
                                Where do the Python unit tests go? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With