Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres 9.0 and pgpool replication : single point of failure?

My application uses Postgresql 9.0 and is composed by one or more stations that interacts with a global database: it is like a common client server application but to avoid any additional hardware, all stations include both client and server: a main station is promoted to act also as server, and any other act as a client to it. This solution permits me to be scalable: a user may initially need a single station but it can decide to expand to more in future without a useless separate server in the initial phase.

I'm trying to avoid that if main station goes down all others stop working; to do it the best solution could be to continuously replicate the main database to unused database on one or more stations.

Searching I've found that pgpool can be used for my needs but from all examples and tutorial it seems that point of failure moves from main database to server that runs pgpool.

I read something about multiple pgpool and heartbeat tool but it isn't clear how to do it.

Considering my architecture, where doesn't exist separated and specialized servers, can someone give me some hints ? In case of failover it seems that pgpool do everything in automatic, can I consider that failover situation can be handled by a standard user without the intervention of an administrator ?

like image 222
doc Avatar asked Feb 03 '23 06:02

doc


1 Answers

For these kind of applications I really like Amazon's Dynamo design. The document by the link is quite big, but it is worth reading. In fact, there're applications that already implement this approach:

  • mongoDB
  • Cassandra
  • Project Voldemort

Maybe others, but I'm not aware. Cassandra started within Facebook, Voldemort is the one used by LinkedIn. Making things distributed and adding redundancy into your data distribution you will step away from traditional Master-Slave replication approaches.

If you'd like to stay with PostgreSQL, it shouldn't be a big deal to implement such approach. You will need to implement an extra layer (a proxy), that will decide based on pre-configured options how to retrieve/save the data.

The proxying layer can be implemented in:

  • application (requires lot's of work IMHO);
  • database;
  • as a middleware.

You can use PL/Proxy on the middleware layer, project originated in Skype. It is deeply integrated into the PostgreSQL, so I'd say it is a combination of options 2 and 3. PL/Proxy will require you to use functions for all kind of queries against the database. In case you will hit performance issues, PgBouncer can be used.

Last note: any way you decide to go, a known amount of development will be required.

EDIT:

It all depends on what you call “failure” and what you consider system being in an interrupted state.

Let's look on the pgpool features.

  1. Connection Pooling PostgreSQL is using a single process (fork) per session. Obviously, if you have a very busy site, you'll hit the OS limit. To overcome this, connection poolers are used. They also allow you to use your resources evenly, so generally it's a good idea to have pooler before your database.
    In case of pgpool outage you'll face a big number of clients unable to reach your database. If you'll point them directly to the database, avoiding pooler, you'll face performance issues.

  2. Replication All your queries will be auto-replicated to slave instances. This has meaning for the DML and DDL queries.
    In case of pgpool outage your replication will stop and slaves will not be able to catchup with master, as there's no change tracking done outside pgpool (as far as I know).

  3. Load Balance Your read-only queries will be spread across several instances, achieving nice response times, allowing you to put more bandwidth on the system.
    In case of pgpool outage your queries will suddenly run much slower, if the system is capable of handling such a load. And this is in the case that master database will catchup instead of failed pgpool.

  4. Limiting Exceeding Connections pgpool will queue connections in case they're not being able to process immediately.
    In case of pgpool outage all such connections will be aborted, which might brake the DB/Application protocol, i.e. Application was designed to never get connection aborts.

  5. Parallel Query A single query is executed on several nodes to reduce response time.
    In case of pgpool outage such queries will not be possible, resulting in a longer processing.

If you're fine to face such conditions and you don't treat them as a failure, then pgpool can serve you well. And if 5 minutes of outage will cost your company several thousands $, then you should seek for a more solid solution.

The higher is the cost of the outage, the more fine tuned failover system should be. Typically, it is not just single tool used to achieve failover automation. In each failure you will have to tweak:

  • DNS, unless you want all clients' reconfiguration;
  • re-initialize backups and failover procedures;
  • make sure old master will not try to fight for it's role in case it comes back (STONITH);
  • in my experience we're people from DBA, SysAdmin, Architects and Operations departments who decide proper strategies.

Finally, in my view, pgpool is a good tool, I do use it. But it is not designed as a complete failover solution, not without extra thinking, measures taken, scripts written. Thus I've provided links to the distributed databases, they provide a much higher level of availability.

And PostgreSQL can be made distributed with a little effort due to it's great extensibility.

like image 71
vyegorov Avatar answered Mar 28 '23 23:03

vyegorov