Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Schema Patches Practices

We've got a schema in Postgres, and we want to institute a good method to apply schema patches.

Currently, we have a series of DDL files that create the schema, tables, sequences, functions, etc. We also have a population script for test environments. Those files are all used to recreate our database environments, for development, testing, etc.

We also have a number of 'patch' files that correspond to versions of our system. ie. patches/1.0.0.sql, patches/1.0.1.sql, etc. These files are used to update our staging and production databases.

This process works for us so far, but there has been some debate in-house how to best patch the schema.

I'm curious what others out there have, as a process, to patch staging and production schema's and how to manage versions of the database.

Thanks!

like image 201
chadl Avatar asked Oct 14 '22 15:10

chadl


1 Answers

At work, for SQL Server, we write schema change scripts that first roll back the change to be made (idempotently, so the rollback section runs fine even if the schema change isn't applied yet), and then a section to apply the change. In TSQL it's easy to peek in the system catalogue or other tables to see if tables/columns/indices/rows already exist and do nothing if not.

In PostgreSQL, you're a bit more constrained with what commands you can simply send to the server-- but on the other hand, DDL is transactional, so a half-applied schema change shouldn't happen. I've adapted the scheme I'm used to at work to use on my own little projects quite well (overkill? but even here I have a dev/test db and a "real" db), for example:

\echo Rolling back schema change #35

BEGIN;

DELETE FROM schema_version WHERE schema_id = 35;

DROP TABLE IF EXISTS location_coordinates;
DROP FUNCTION IF EXISTS location_coordinates_populate();

END;

\echo Applying schema change #35

BEGIN;

INSERT INTO schema_version(schema_id, description) VALUES(35, 'Add location_coordinates table');

CREATE TABLE location_coordinates(
 location_id INT PRIMARY KEY REFERENCES location(location_id),
 latitude FLOAT NOT NULL,
 longitude FLOAT NOT NULL,
 earth_coordinates earth NOT NULL,
 box_10miles cube NOT NULL
);

GRANT SELECT, INSERT, UPDATE, DELETE ON location_coordinates TO ui;

CREATE FUNCTION location_coordinates_populate() RETURNS TRIGGER LANGUAGE 'plpgsql' AS $$
BEGIN
  new.earth_coordinates := ll_to_earth(new.latitude, new.longitude);
  new.box_10miles := earth_box(new.earth_coordinates, 10 * 1609.344);
  RETURN new;
END
$$;

CREATE TRIGGER location_coordinates_populate BEFORE INSERT OR UPDATE ON location_coordinates
 FOR EACH ROW EXECUTE PROCEDURE location_coordinates_populate();

INSERT INTO location_coordinates(location_id, latitude, longitude)
 SELECT location_id, latitude, longitude FROM location WHERE latitude IS NOT NULL AND longitude IS NOT NULL;

CREATE INDEX location_coordinates_10miles ON location_coordinates USING gist (box_10miles);

END;

\echo Done

This script can be run on the database just with "psql -f schema-changes/35.sql". By just cutting out up to the "applying..." message, I can get the commands to roll it back. And as you can see, the change maintains a metadata table "schema_version" so I can see which changes are applied. The entire change is done as a transaction, data migration and all. Here I've used the "IF EXISTS" capability of the DROP commands to make the rollback section happy even when the change is unapplied. Istr one thing we did at work for Oracle was write schema changes as PL/SQL-- you could perhaps have some functions in plpgsql to help with making changes?

Note that in the change above, where I'm migrating the "latitude" and "longitude" columns (which were nullable) out of "location" to a separate "location_coordinates" relation (and adding in the earthdistance stuff), I didn't drop the old columns. One thing we have to be careful of is to make schema changes backwards-compatible if possible. So I can apply this schema change before updating the app to use the new tables. I'd have a second change to drop the old columns to apply after updating the app. At work, these would be done in two different release cycles, so during release X we still have the optional of rolling back the app to release X-1 without having to roll back all the schema changes first; as well as being able to deploy schema changes in a separate window before the apps. (Technically I should have written a trigger so updates to the old table are synced to the new table, but I haven't because that's too much like work :))

We also have things like an application that spiders all our databases to see what's in the schema_version table, and tracks changes, so people can even see what changes were made without having to connect, and get an idea of the history of each change (we track "rolled back in dev", "applied in dev" etc). At work our schema_version table also includes authorship information etc. A magic way of applying the version information from version control would be cool- one problem we have is that if a SC gets applied in QA, for instance, then changed in Perforce, maybe no-one notices. So a way to track that schema change 35 revision #4 was applied would be good.

One thing to note- schema changes for us are numbered independently of application versions. Obviously they are related--- that's another thing the spidering app allows people to enter--- but we try to have lots of small changes rather than a giant "here's everything for release X" patch. Schema changes are also used for things like adding new indices, so might not be app-driven at all. In general, schema changes are "owned" by developers, not DBAs- although in the "create index" example above, the DBA is basically acting in a developer role and owning the schema change. Yes, we insist on a high level of SQL-ability from developers- although other groups in the company work a bit differently and give more work to the DB team.

like image 142
araqnid Avatar answered Oct 18 '22 21:10

araqnid