Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does SQLAlchemy create_engine import Engine class?

I am relatively new to python and am experimenting with SQLAlchemy. I noticed that, to create an engine, I have to use the create_engine() function, imported via from sqlalchemy import create_engine.

Now, the create_engine function returns an instance of the sqlalchemy.engine.base.Engine class. However I never imported this class, I only imported the create_engine module. So, how does Python know about the sqlalchemy.engine.base.Engine class?

like image 436
Joren Sips Avatar asked Jan 15 '18 10:01

Joren Sips


People also ask

What is SQLAlchemy import create_engine?

The create_engine() method of sqlalchemy library takes in the connection URL and returns a sqlalchemy engine that references both a Dialect and a Pool, which together interpret the DBAPI's module functions as well as the behavior of the database.

Is SQLAlchemy Engine thread safe?

Every pool implementation in SQLAlchemy is thread safe, including the default QueuePool . This means that 2 threads requesting a connection simultaneously will checkout 2 different connections. By extension, an engine will also be thread-safe.

Why does SQLAlchemy use connection pools?

A connection pool is a standard technique used to maintain long running connections in memory for efficient re-use, as well as to provide management for the total number of connections an application might use simultaneously.


1 Answers

You probably don't understand what importing does.

Python imports modules globally. There is a single structure, called sys.modules, that stores imported modules as a dictionary:

>>> import sys
>>> sys.modules
{'builtins': <module 'builtins' (built-in)>, 'sys': <module 'sys' (built-in)>, '_frozen_importlib': <module 'importlib._bootstrap' (frozen)>, '_imp': <module '_imp' (built-in)>, ...}

When you import SQLAlchemy, you import a package, a structure of multiple modules, where one import triggers more imports. All those imported modules are stored in that same place:

>>> import sqlalchemy
>>> [name for name in sys.modules if 'sqlalchemy' in name]
['sqlalchemy', 'sqlalchemy.sql', 'sqlalchemy.sql.expression', 'sqlalchemy.sql.visitors', 'sqlalchemy.util', 'sqlalchemy.util.compat', 'sqlalchemy.util._collections', 'sqlalchemy.util.langhelpers', 'sqlalchemy.exc', 'sqlalchemy.util.deprecations', 'sqlalchemy.sql.functions', 'sqlalchemy.sql.sqltypes', 'sqlalchemy.sql.elements', 'sqlalchemy.inspection', 'sqlalchemy.sql.type_api', 'sqlalchemy.sql.operators', 'sqlalchemy.sql.base', 'sqlalchemy.sql.annotation', 'sqlalchemy.processors', 'sqlalchemy.cprocessors', 'sqlalchemy.event', 'sqlalchemy.event.api', 'sqlalchemy.event.base', 'sqlalchemy.event.attr', 'sqlalchemy.event.registry', 'sqlalchemy.event.legacy', 'sqlalchemy.sql.schema', 'sqlalchemy.sql.selectable', 'sqlalchemy.sql.ddl', 'sqlalchemy.util.topological', 'sqlalchemy.sql.util', 'sqlalchemy.sql.dml', 'sqlalchemy.sql.default_comparator', 'sqlalchemy.sql.naming', 'sqlalchemy.events', 'sqlalchemy.pool', 'sqlalchemy.log', 'sqlalchemy.interfaces', 'sqlalchemy.util.queue', 'sqlalchemy.engine', 'sqlalchemy.engine.interfaces', 'sqlalchemy.sql.compiler', 'sqlalchemy.sql.crud', 'sqlalchemy.engine.base', 'sqlalchemy.engine.util', 'sqlalchemy.cutils', 'sqlalchemy.engine.result', 'sqlalchemy.cresultproxy', 'sqlalchemy.engine.strategies', 'sqlalchemy.engine.threadlocal', 'sqlalchemy.engine.url', 'sqlalchemy.dialects', 'sqlalchemy.types', 'sqlalchemy.schema', 'sqlalchemy.engine.default', 'sqlalchemy.engine.reflection']

Once a module is loaded from disk and added to that structure, Python doesn't need to load it a second time. Dots separate module names in a hierarchy, so everything starting with sqlalchemy. lives inside the sqlalchemy package as a tree structure. There are a lot of sqlalchemy modules here, this is a large project, and they were all loaded (directly or indirectly) by the root package module, sqlalchemy/__init__.py.

The other thing import does is bind a name in your current namespace. Each module is a 'global' namespace, all names in that namespace are visible in that namespace. Your Python script is imported as the __main__ namespace, and all names in it are available to your script. If you create a module foo, then that is a separate namespace with their own names. import adds names to your global namespace from another module. And in Python, names are just references; the actual objects each of these names reference all live on a big pile in memory, called the heap.

The line

from sqlalchemy import create_engine

first makes sure that the object sys.modules['sqlalchemy'] exists, and adds the name create_engine to your current namespace, a reference to sqlalchemy.create_engine, as if the line create_engine = sys.modules['sqlalchemy'].create_engine was executed:

>>> sys.modules['sqlalchemy'].create_engine
<function create_engine at 0x10188bbf8>
>>> from sqlalchemy import create_engine
>>> create_engine is sys.modules['sqlalchemy'].create_engine
True

Again, all names in Python are just references to a big pile of objects in memory.

When you call the create_engine() function, the code for that function is executed, and that function has access to all the globals in the namespace it was defined in. In this case the function is defined in the sqlalchemy.engine module (the top-level sqlalchemy module itself has imported it as from sqlalchemy.engine import create_engine so you can access it from a more convenient location):

>>> create_engine.__module__
'sqlalchemy.engine'
>>> sys.modules['sqlalchemy.engine']
<module 'sqlalchemy.engine' from '/Users/mjpieters/Development/venvs/stackoverflow-3.6/lib/python3.6/site-packages/sqlalchemy/engine/__init__.py'>
>>> sorted(vars(sys.modules['sqlalchemy.engine']))
['BaseRowProxy', 'BufferedColumnResultProxy', 'BufferedColumnRow', 'BufferedRowResultProxy', 'Compiled', 'Connectable', 'Connection', 'CreateEnginePlugin', 'Dialect', 'Engine', 'ExceptionContext', 'ExecutionContext', 'FullyBufferedResultProxy', 'NestedTransaction', 'ResultProxy', 'RootTransaction', 'RowProxy', 'Transaction', 'TwoPhaseTransaction', 'TypeCompiler', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', 'base', 'connection_memoize', 'create_engine', 'ddl', 'default', 'default_strategy', 'engine_from_config', 'interfaces', 'reflection', 'result', 'strategies', 'threadlocal', 'url', 'util']

That list of names are all the names defined in the same module as create_engine is defined in. The module was already loaded by code executed when you imported the sqlalchemy module. The function has access to all those and can return you any such object. You'll note that the is a Engine name defined there:

>>> sys.modules['sqlalchemy.engine'].Engine
<class 'sqlalchemy.engine.base.Engine'>

So that object is already loaded into Python memory. All the function does is create an instance of that class for you and return it:

>>> engine = create_engine('sqlite:///:memory:')
>>> engine
Engine(sqlite:///:memory:)
>>> type(engine)
<class 'sqlalchemy.engine.base.Engine'>

If you want to learn more about Python and names, I recommend you read Ned Batchelder's essay on Facts and myths about Python names and values.

like image 137
Martijn Pieters Avatar answered Oct 09 '22 16:10

Martijn Pieters