Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL development workflow

I am starting to build a new database for my project using PostgreSQL. (I am new to PostgreSQL and database by the way.)

I think my development workflow is very bad, and here is a part of it:

  1. create table/view/function with pgAdmin.
  2. determine the name of the file before saving the code.
    The goal is to be able to recreate the database automatically by running all the saved scripts,
    I need to know the order to run these scripts for dependency reason.
    So I add a number for each file indicating the order. for example: 001_create_role_user.ddl, 002_create_database_project.ddl, 013_user_table.ddl
  3. save the code.
  4. commit the file to repository using GIT.

Here are some bads I can think of:

  • I can easily forget what changes I made. For example, created a new type, or edited comment
  • It is hard to determine a name (order) for the file.
  • Change the code would be a pain in the ass, especially when the new code changes the order.

So my workflow is bad. I was wondering what other Postgres developers' workflow looks like.

Are there any good tools (free or cheap) for editing and saving scripts? good IDE maybe?

It would be great if I can create automated unit tests for the database.

Any tool for recreating the database? CI server tool?

Basically I am looking for any advice, good practice, or good tool for database development.

(Sorry, this question may not fit for the Q&A format, but I do not know where else to ask this question.)

like image 876
Xin Avatar asked Jul 31 '13 03:07

Xin


3 Answers

Check out liquibase. We use it in the company I work at to setup our PostgreSQL database. It's open source, easy to use and the changelog file you end up with can be added to source control. Each changeset gets an id, so that each changeset is only run once. You end up with two extra tables for tracking the changes to the database when it's run.

While it's DB agnostic, you can use PostgreSQL SQL directly in each changeset and each changeset can have it's own comments.

The only caveat from having used it is that you have to caution yourself and others not to re-use a changeset once it's been applied to a database. Any changes to an already applied changeset result in a different checksum (even whitespace) which can cause liquibase to abort it's updates. This can end up in failed DB updates in the field, so each update to any of the changelogs should be tested locally first. Instead all changes, however minor should be inserted into a new changeset with a new id. They have a changeset sub-tag called "validCheckSum" to let you work around this, but I think it's better to try to enforce always making a new changeset.

Here are the doc links for creating a table and creating a view for example.

like image 149
n0741337 Avatar answered Nov 16 '22 10:11

n0741337


Well, your question is actually quite relevant to any database developer, and, if I understand it correctly, there is another way to get to your desired results.

One interesting thing to mention is that your idea of separating different changes into different files is the concept of migrations of Ruby On Rails. You might even be able to use the rake utility to keep track of a workflow like yours.

But now to what I think migh be your solution. PostgreSQL, and others to be sincere, have specific utilities to handle data and schemas like what you probably need.

The pg_dumpall command-line executable will dump the whole database into a file or the console, in a way that the psql utility can simply "reload" into the same, or into another (virgin) database.

So, if you want to keep only the current schema (no data!) of a running database cluster, you can, as the postgres-process owner user:

$ pg_dumpall --schema-only > schema.sql

Now the schema.sql will hold exactly the same users/databases/tables/triggers/etc, but not data. If you want to have a "full-backup" style dump (and that's one way to take a full backup of a database), just remove the "--schema-only" option from the command line.

You can reload the file into another (should be virgin, you might mess up a database with other data doing this):

$ psql -f schema.sql postgres

Now if you only want to dump one database, one table, etc. you should use the pg_dump utility.

$ pg_dump --schema-only <database> > database-schema.sql

And then, to reload the database into a running postgresql server:

$ psql <database> < database-schema.sql

As for version control, you can just keep the schema.sql file under it, and just dump the database again into the file before every vc commit. So at some particular version control state will you have the code and the working database schema that goes with it.

Oh, and all the tools I mentioned are free, and pg_dump and pg_dumpall come with the standard PostgreSQL installation.

Hope that helps,

Marco

like image 44
Marco Poli Avatar answered Nov 16 '22 12:11

Marco Poli


You're not far off. I'm a Java developer, not a DBA, but building out the database as a project grows is an important task for the teams I've been on, and here's how I've seen it done best:

  1. All DB changes are driven by DDL (SQL create, alter, or delete statements) plain text scripts. No changes through the DB client. Use a text editor that supports syntax highlighting like vim or notepad++, as the highlighting can help you find errors before you run the script.
  2. Use a number at the beginning of each DDL script to define the order that scripts are run in. Base scripts have lower numbers. As the project grows, use alter new alter scripts to change the table, don't redefine the table in the initial script.
  3. Use a script and the psql client to load the DDL scripts from lowest to highest. Here's the bash script we use. You can use it as a base for a .bat script on windows.

#!/bin/bash

export PGDATABASE=your_db export

export PGUSER=your_user export

export PGPASSWORD=your_password

for SQL_SCRIPT in $( find ./ -name "*.sql" -print | sort);
do

echo "**** $SQL_SCRIPT ****"
  psql -q < $SQL_SCRIPT    

done

  1. As the project grows, use new alter scripts to change the table, don't redefine the table in the initial script.

  2. All scripts are checked into source control. Each release is tagged so you can regenerate that version of the database in the future.

  3. For unit testing and CI, most CI servers can run a script to drop and recreate a schema. One oft-cited framework for PostGresql unit testing is pgTAP

like image 1
lreeder Avatar answered Nov 16 '22 12:11

lreeder