Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transactions and sqlalchemy

I am trying to figure out how to insert many (in the order of 100k) records into a database using SQLAlchemy in Python 3. Everything points to using transactions. However, I am slightly confused as to how that is done.

Some pages state that you get a transaction from connection.begin(), others places say it is session.begin() and this page here says it is session.create_transaction() which doesn't exist.

Here is what I am trying to do:

def addToTable(listOfRows):
    engine = create_engine('postgresql+pypostgresql:///%s' % db,echo = False)
    Session = sessionmaker(bind = engine)
    session = Session()
    table = myTable(engine,session)

    for row in listOfRows:
       table.add(row)
    table.flush() ### ideally there would be a counter and you flush after a couple of thousand records


class myTable:

    def __init__(self,engine,session):
       self.engine  = engine
       self.session = session
       self.transaction =createTransaction()# Create transaction code here

   def add(self,row):
       newRow = tableRow(row) ## This just creates a representation of a row in the DB
       self.transaction.add(newRow)
       self.transaction.flush()

   def flush(self):
       self.transaction.commit()
like image 807
Lezan Avatar asked Nov 11 '13 10:11

Lezan


People also ask

What is transaction in SQLAlchemy?

SQLAlchemy transaction is one of the most efficient and popular ways using which one can make the use of the transactional module in the applications created by using python where one can communicate with the relational database from the python application using the sqlalchemy transactions.

What is Dbapi in SQLAlchemy?

Database URLs Dialect names include the identifying name of the SQLAlchemy dialect, a name such as sqlite , mysql , postgresql , oracle , or mssql . The drivername is the name of the DBAPI to be used to connect to the database using all lowercase letters.

What is SQLAlchemy used for?

SQLAlchemy is a library that facilitates the communication between Python programs and databases. Most of the times, this library is used as an Object Relational Mapper (ORM) tool that translates Python classes to tables on relational databases and automatically converts function calls to SQL statements.

How does SQLAlchemy Session work?

What does the Session do? One of the core concepts in SQLAlchemy is the Session . A Session establishes and maintains all conversations between your program and the databases. It represents an intermediary zone for all the Python model objects you have loaded in it.


2 Answers

I highly suggest that you do both tutorials before continuing on your trip with SQLAlchemy. They are really helpful and explain many concepts. Afterwards, I suggest you read Using the Session as this then goes on to explain how the session fits into all of this.

To your problem, there are two solutions: One using the ORM and the other using the Core. The former is easier, the latter is faster. Let's take the easy road first. A transaction is only used to wrap all your statements into a single operation. That is, if something fails, you can abort all of it and are not left with something somewhere in between. So you most likely want a transaction, but it would work without one. Here is the quickest way:

with session.begin():
    session.add_all([tableRow(row) for row in listOfRows])

Depending on your data SQLAlchemy might even be able to optimize your INSERT statement in such a way that it executes multiple at a time. Here is what's going on:

  • A transaction is started using session.begin
  • The data is added (using add_all, but a loop with multiple add would also be fine)
  • The session is committed. If something goes wrong here, the transaction will be aborted and you can fix the error.

So this is clearly a good way, but it is not the fastest way, because SQLAlchemy has to go through all the ORM algorithms which can produce somewhat of an overhead. If this is a one-time database initialization, you can avoid the ORM. In that case, instead of creating an ORM class (tableRow), you create a dictionary with all keys (how depends on the data). Again you can use a context manager:

with engine.begin() as connection:
    connection.execute(tableRow.__table__.insert().
                       values([row_to_dict(row) for row in listOfRows]))

This would most likely be slightly faster but also less convenient. It works the same way as the session above only that it constructs the statement from the Core and not the ORM.

like image 148
javex Avatar answered Sep 29 '22 11:09

javex


UPDATE 2020-01-23

the answer from @javex is outdated.

TLDR: You can use the session directly without calling begin. Just make sure autocommit is set to false

Long answer:

See the documentation for the Session https://docs.sqlalchemy.org/en/13/orm/session_api.html

Warning

The Session.begin() method is part of a larger pattern of use with the Session known as autocommit mode. This is essentially a legacy mode of use and is not necessary for new applications. The Session normally handles the work of “begin” transparently, which in turn relies upon the Python DBAPI to transparently “begin” transactions; there is no need to explicitly begin transactions when using modern Session programming patterns. In its default mode of autocommit=False, the Session does all of its work within the context of a transaction, so as soon as you call Session.commit(), the next transaction is implicitly started when the next database operation is invoked. See Autocommit Mode for further background.

like image 28
wasserholz Avatar answered Sep 29 '22 11:09

wasserholz