How to make SQLAlchemy
in Tornado
to be async
? I found example for MongoDB on async mongo example but I couldn't find anything like motor
for SQLAlchemy
. Does anyone know how to make SQLAlchemy
queries to execute with tornado.gen
( I am using MySQL
below SQLAlchemy
, at the moment my handlers reads from database and return result, I would like to make this async).
The SQLAlchemy event system is not directly exposed by the asyncio extension, meaning there is not yet an “async” version of a SQLAlchemy event handler.
At the ORM level, the speed issues are because creating objects in Python is slow, and the SQLAlchemy ORM applies a large amount of bookkeeping to these objects as it fetches them, which is necessary in order for it to fulfill its usage contract, including unit of work, identity map, eager loading, collections, etc.
If you want to view your data in a more schema-centric view (as used in SQL), use Core. If you have data for which business objects are not needed, use Core. If you view your data as business objects, use ORM. If you are building a quick prototype, use ORM.
SQLAlchemy supports the widest variety of database and architectural designs as is reasonably possible. Unit Of Work. The Unit Of Work system, a central part of SQLAlchemy's Object Relational Mapper (ORM), organizes pending insert/update/delete operations into queues and flushes them all in one batch.
tornado-sqlalchemyhandles the following problems/use-cases: Boilerplate- Tornado does not bundle code to handle database connections. So developer building apps using Tornado end up writing their own code - code to establish database connections, initialize engines, get/teardown sessions, and so on.
The asyncio extension as of SQLAlchemy 1.4.3 can now be considered to be beta levelsoftware. API details are subject to change however at this point it is unlikely for there to be significant backwards-incompatible changes. See also Asynchronous IO Support for Core and ORM- initial feature announcement
It is much more efficient to get the laundry started in the washing machine, then cook dinner while the laundry is working. That in essence is what the async / await syntax does in python’s asyncio library and by extension tornado which uses asyncio behind the scenes.
Asynchronous operations in Tornado generally return placeholder objects ( Futures ), with the exception of some low-level components like the IOLoop that use callbacks. Futures are usually transformed into their result with the await or yield keywords.
ORMs are poorly suited for explicit asynchronous programming, that is, where the programmer must produce explicit callbacks anytime something that uses network access occurs. A primary reason for this is that ORMs make extensive use of the lazy loading pattern, which is more or less incompatible with explicit async. Code that looks like this:
user = Session.query(User).first() print user.addresses
will actually emit two separate queries - one when you say first()
to load a row, and the next when you say user.addresses
, in the case that the .addresses
collection isn't already present, or has been expired. Essentially, nearly every line of code that deals with ORM constructs might block on IO, so you'd be in extensive callback spaghetti within seconds - and to make matters worse, the vast majority of those code lines won't actually block on IO, so all the overhead of connecting callbacks together for what would otherwise be simple attribute access operations will make your program vastly less efficient too.
A major issue with explicit asynchronous models is that they add tremendous Python function call overhead to complex systems - not just on the user-facing side like you get with lazy loading, but on the internal side as well regarding how the system provides abstraction around the Python database API (DBAPI). For SQLAlchemy to even have basic async support would impose a severe performance penalty on the vast majority of programs that don't use async patterns, and even those async programs that are not highly concurrent. Consider SQLAlchemy, or any other ORM or abstraction layer, might have code like the following:
def execute(connection, statement): cursor = connection.cursor() cursor.execute(statement) results = cursor.fetchall() cursor.close() return results
The above code performs what seems to be a simple operation, executing a SQL statement on a connection. But using a fully async DBAPI like psycopg2's async extension, the above code blocks on IO at least three times. So to write the above code in explicit async style, even when there's no async engine in use and the callbacks aren't actually blocking, means the above outer function call becomes at least three function calls, instead of one, not including the overhead imposed by the explicit asynchronous system or the DBAPI calls themselves. So a simple application is automatically given a penalty of 3x the function call overhead surrounding a simple abstraction around statement execution. And in Python, function call overhead is everything.
For these reasons, I continue to be less than excited about the hype surrounding explicit async systems, at least to the degree that some folks seem to want to go all async for everything, like delivering web pages (see node.js). I'd recommend using implicit async systems instead, most notably gevent, where you get all the non-blocking IO benefits of an asynchronous model and none of the structural verbosity/downsides of explicit callbacks. I continue to try to understand use cases for these two approaches, so I'm puzzled by the appeal of the explicit async approach as a solution to all problems, i.e. as you see with node.js - we're using scripting languages in the first place to cut down on verbosity and code complexity, and explicit async for simple things like delivering web pages seems to do nothing but add boilerplate that can just as well be automated by gevent or similar, if blocking IO is even such a problem in a case like that (plenty of high volume websites do fine with a synchronous IO model). Gevent-based systems are production proven and their popularity is growing, so if you like the code automation that ORMs provide, you might also want to embrace the async-IO-scheduling automation that a system like gevent provides.
Update: Nick Coghlan pointed out his great article on the subject of explicit vs. implicit async which is also a must read here. And I've also been updated to the fact that pep-3156 now welcomes interoperability with gevent, reversing its previously stated disinterest in gevent, largely thanks to Nick's article. So in the future I would recommend a hybrid of Tornado using gevent for the database logic, once the system of integrating these approaches is available.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With