Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgresql replication in rails with data-fabric gem

I am currently setting up a master-slave app using Ruby on Rails. I am planning to use data-fabric or octopus gem for handling the read/write connections.

This is my first time setting up master-slave DBs. I am confused over the various open source tools available to implement the postgresql replication e.g. pgpool II, pgcluster, Bucardo and Hot Standby/Streaming Replication (built in feature in postgresql 9.1)

My requirements are

  • fault tolerance(high availability and no data loss on failover)
  • load balancing

Thanks in advance

Note: I have gone through the stackoverflow post regarding postgresql replication but they are pretty old and not helping to conclude on which tool I should go with.

like image 416
Vishakha Avatar asked Mar 05 '12 12:03

Vishakha


1 Answers

In your case, streaming replication is the place to start. It is not very flexible but it does what you need regarding database reads as long as you don't need to replicate between major versions.

Database Replication 101

Database replication is a way to ensure that data saved to a specific server becomes stored in a number of other servers. This is often done to better utilize more limited network connections, ensure fault tolerance (so there is essentially a hot back-up), ensure that read-only queries can be distributed over a larger number of databases, etc. This all must be done without sacrificing the the basic guarantees of ACID.

There are a number of different overlapping ways to categorize replication solutions. These include:

  • Page or file-level vs row-level vs statement-level

  • Synchronous vs Asynchronous

  • Master-slave vs Multi-Master

In general understanding replication and the tradeoffs between solutions requires relatively strong understanding of database mechanics and ACID guarantees. I will assume you are relatively familiar with storage mechanics, and deterministic vs non-deterministic operations and the like.

What is Being Replicated? File changes (Physical) vs Row Changes (Logical) vs Statements

The simplest approach is to replicate block changes to files, for example as stored in the write-ahead log in PostgreSQL. This replicates changes at the page level and it requires identical file formats. This means you cannot replicate across major versions, CPU architectures, or operating systems. Anything that could affect the alignment of tuples, for example, will cause the replication to either fail or, worse, corrupt the slave's database. This is the approach streaming replication uses. It is simple to set up, and it always replicates everything in the database cluster.

Additionally this approach means you can easily guarantee that the master and slave databases are identical down to the file level. Because of the fact that the PostgreSQL WAL is cluster-global it is unlikely that this approach will ever replicate anything short of the entire database cluster.

As a description of how this works, suppose I:

UPDATE my_table SET rand_value = random() WHERE id > 10000;

In this case, this changes a bunch of data pages and the file operations are replicated to the replicas. The files remain identical between the master and slave.

Another approach, one taken by Slony, Bucardo, and others is to replicate rows in a logical manner. In this approach, changed rows are flagged and logged, and the changes sent to the replicas. The replicas re-run row operations from the master database. Because these are add-on tools which do not replicate file operations but rather logical database operations, they can replicate across CPU architectures, operating systems, etc. Also they are usually designed so that you can replicate some but not all tables in a database, allowing for a lot of flexibility. On the other hand this leads to a lot of potential for errors. "Oops, that table was not replicated" is a real problem.

In this case when I run the update statement above, a trigger is fired capturing the actual rows inserted and deleted and these are logged, replicated, and the row operations re-run. Because this happens after rand() is run, the databases are logically, but not necessarily physically identical.

A final approach is statement replication. In this case we replicate statements and re-run the statements on the replicas. Some configurations of PgPool will do this. In this case, you cannot ensure that a database is logically equivalent to its replica if any non-deterministic functions are run. In the statement above, the statement itself will run on each replica, ensuring different pseudorandom numbers in the relevant column.

Synchronous vs Asynchronous

This distinction is important to understand regarding failover guarantees. In an asynchronous replication system, the updates are queued and transferred when possible to the replicas and re-run there. In a synchronous replication system the database which accepts the write will not return a successful commit until at least a certain number of replica databases report a successful commit.

Asynchronous replication is generally more robust and produces better availability than synchronous replication. This is because synchronous replication introduces additional points of failure. If you have one master and one slave, then if either system goes down, your database becomes unavailable at least for write operations.

The tradeoff though is that synchronous replication offers a guarantee that data which is committed is in fact available on replicas in the event that the master, say, suffers catastrophic hardware failure immediately following commit. This is a very low probability event, but in some cases it is important that you know the data is still available. In short this provides additional durability guarantees not present in async replication.

Multi-Master vs Master-Slave

Most replication systems are master-slave. In this case, all writes begin at one node and are replicated to other nodes. Writes may only begin at one node. They may not begin at other nodes. This makes replication straight-forward because we know that the slaves represent a past state of the master.

Multi-master replication allows writes to occur to more than one node. In an asynchronous replication system, this leads to the problem of conflict resolution. These problems are actually worse than most assume when you add DDL statements. Suppose two different users run the above update statement on two different masters. We will now have a set of records that have to be replicated across but they will conflict.

Multi-master replication typically requires that people think through this conflict resolution process quite carefully. It is never a process that just works out of the box. Often times you write your own conflict resolution routines. For this reason I typically recommend avoiding multi-master replication unless you really need it.

like image 169
Chris Travers Avatar answered Nov 01 '22 05:11

Chris Travers