Is there a performance advantage (or disadvantage) when using default
instead of server_default
for mapping table column default values when using SQLAlchemy with PostgreSQL?
My understanding is that default
renders the expression in the INSERT
(usually) and that server_default
places the expression in the CREATE TABLE
statement. Seems like server_default
is analogous to typical handling of defaults directly in the db such as:
CREATE TABLE example (
id serial PRIMARY KEY,
updated timestamptz DEFAULT now()
);
...but it is not clear to me if it is more efficient to handle defaults on INSERT
or via table creation.
Would there be any performance improvement or degradation for row inserts if each of the default
parameters in the example below were changed to server_default
?
from uuid import uuid4
from sqlalchemy import Column, Boolean, DateTime, Integer
from sqlalchemy.dialects.postgresql import UUID
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.sql import func
Base = declarative_base()
class Item(Base):
__tablename__ = 'item'
id = Column(UUID(as_uuid=True), primary_key=True, default=uuid4)
count = Column(Integer, nullable=False, default=0)
flag = Column(Boolean, nullable=False, default=False)
updated = Column(DateTime(timezone=True), nullable=False, default=func.now())
NOTE: The best explanation I found so far for when to use default
instead of server_default
does not address performance (see Mike Bayer's SO answer on the subject). My oversimplified summary of that explanation is that default
is preferred over server_default
when...
...so the question remains as to whether performance should be considered when choosing between default
and server_default
?
It is impossible to give you a 'this is faster' answer, because performance per default value expression can vary widely, both on the server and in Python. A function to retrieve the current time behaves differently from a scalar default value.
Next, you must realise that defaults can be provided in five different ways:
0
or True
. The value is used in an INSERT
statement.INSERT
statements, or when a column value is set to DEFAULT
in an INSERT
or UPDATE
statement.Note that when it comes to a SQL expression determining the default value, be that a client-side SQL expression, a server-side DLL expression, or a trigger, it makes very little difference to a database where the default value expression is coming from. The query executor will need to know how to produce values for a given column, once that's parsed out of the DML statement or the schema definition, the server still has to execute the expression for each row.
Choosing between these options is rarely going to be based on performance alone, performance should at most be but one of multiple aspects you consider. There are many factors involved here:
default
with a scalar or Python function directly produces a Python default value, then sends the new value to the server when inserting. Python code can access the default value before the data is inserted into the database.A client-side SQL expression, a server_default
value, and server-side implicit defaults and triggers all have the server generate the default, which then must be fetched by the client if you want to be able to access it in the same SQLAlchemy session. You can't access the value until the object has been inserted into the database.
Depending on the exact query and database support, SQLAlchemy may have to make extra SQL queries to either generate a default before the INSERT
statement or run a separate SELECT
afterwards to fetch the defaults that have been inserted. You can control when this happens (directly when inserting or on first access after flushing, with the eager_defaults
mapper configuration).
server_default
or other default attached to the schema (such as a trigger) ensures that all clients will use the same defaults, regardless, while defaults implemented in Python can't be accessed by other platforms.When using PostgreSQL, SQLAlchemy can make use of the RETURNING
clause for DML statements, which gives a client access to server-side generated defaults in a single step.
So when using a server_default
column default that calculates a new value for each row (not a scalar value), you save a small amount of Python-side time, and save a small amount of network bandwidth as you are not sending data for that column over to the database. The database could be faster creating that same value, or it could be slower; it largely depends on the type of operation. If you need to have access to the generated default value from Python, in the same transaction, you do then have to wait for a return stream of data, parsed out by SQLAlchemy. All these details can become insignificant compared to everything else that happens around inserting or updating rows, however.
Do understand that a ORM is not suitable to be used for high-performance bulk row inserts or updates; quoting from the SQAlchemy Performance FAQ entry:
The SQLAlchemy ORM uses the unit of work pattern when synchronizing changes to the database. This pattern goes far beyond simple “inserts” of data. It includes that attributes which are assigned on objects are received using an attribute instrumentation system which tracks changes on objects as they are made, includes that all rows inserted are tracked in an identity map which has the effect that for each row SQLAlchemy must retrieve its “last inserted id” if not already given, and also involves that rows to be inserted are scanned and sorted for dependencies as needed. Objects are also subject to a fair degree of bookkeeping in order to keep all of this running, which for a very large number of rows at once can create an inordinate amount of time spent with large data structures, hence it’s best to chunk these.
Basically, unit of work is a large degree of automation in order to automate the task of persisting a complex object graph into a relational database with no explicit persistence code, and this automation has a price.
ORMs are basically not intended for high-performance bulk inserts - this is the whole reason SQLAlchemy offers the Core in addition to the ORM as a first-class component.
Because an ORM like SQLAlchemy comes with a hefty overhead price, any performance differences between a server-side or Python-side default quickly disappears in the noise of ORM operations.
So if you are concerned about performance for large-quantity insert or update operations, you would want to use bulk operations for those, and enable the psycopg2
batch execution helpers to really get a speed boost. When using these bulk operations, I'd expect server-side defaults to improve performance just by saving bandwidth moving row data from Python to the server, but how much depends on the exact nature of the default values.
If ORM insert and update performance outside of bulk operations is a big issue for you, you need to test your specific options. I'd start with the SQLAlchemy examples.performance
package and add your own test suite using two models that differ only in a single server_default
and default
configuration.
There's something else important rather than just comparing the performance of the two
If you needed to add a new Column create_at (Not Null)
to an existing Table User
with some data in it, default
will not work.
If used default
, during upgrading the database, the error will occur saying cannot insert Null value to existing data in the table. And this will cause significant troubles if you want to maintain your data, even just for testing.
And when used server_default
, during upgrading the DB, database will insert the current DateTime value to all previous existing testing data.
So in this case, only server_default
will work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With