Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the best way to distribute postgresql

I have a database in postgresql for a software as service with hundreds of customers, currently have a schema of postgresql for each customer, but i like a best solution because the customers rapidly increase. I read about cassandra but i don't wanna lose the integrity of primary,foregin keys and checks. Also read about postgresql in distributed systems, but i dont know what is the best way for implement this currently

like image 988
richie-torres Avatar asked May 01 '12 16:05

richie-torres


People also ask

Can PostgreSQL be distributed?

Based on this definition, we can conclude that EDB Postgres Distributed enables Postgres to be a distributed database management system. If we treat and use EDB Postgres Distributed in that manner, we stand to gain much from the unique qualities of distributed systems, which go beyond extremely high availability.

What is the best way to transfer the data in a PostgreSQL database?

Data export with pg_dump The idea behind this dump method is to generate a file with SQL commands that, when fed back to the server, will recreate the database in the same state as it was at the time of the dump. PostgreSQL provides the utility program pg_dump for this purpose.

Is PostgreSQL good for data warehouse?

PostgreSQL Data Warehouse leverages OLTP and OLAP to manage streamlined communications between databases. For example, it's easier to store the data and communicate with databases using OLTP using OLAP. These features make PostgreSQL an organization's favorite for OLAP as a data warehouse.


2 Answers

i don't wanna lose the integrity of primary,foregin keys and checks

The point of systems like Cassandra is, once your dataset or workload doesn't fit on a single machine, you have to give up those things even if you stay on postgresql. (I covered the details in a talk that I highly recommend: http://blip.tv/pycon-us-videos-2009-2010-2011/pycon-2010-what-every-developer-should-know-about-database-scalability-21-3280648).

So Cassandra is an answer to the question, "If we know we're going to have to give up foreign keys and joins, what can we build by rethinking how we design our database?"

If you never get to that point, Cassandra is overkill. (But you should still watch that talk. :)

like image 36
jbellis Avatar answered Sep 20 '22 18:09

jbellis


There are four levels at which you can separate your customers:

  1. Run a separate PostgreSQL cluster for each customer. This provides maximum separation; each client is on a separate port with its own set of system tables, transaction log, etc.

  2. Put each customer in a separate database in the same cluster. This way they each have a separate login, but on the same port number, and they share global tables like pg_database.

  3. Give each customer a separate schema in the same database. This doesn't require separate user IDs if they are only connecting through your software, because you can just set the search_path. Of course you can use separate user IDs if you prefer.

  4. Make customer_id part of the primary key of each table, and be sure to limit by that in your software. This is likely to scale better than having duplicate tables for each of hundreds of users, but you must be very careful to always qualify your queries by customer_id.

Some people have been known to combine these techniques, for example, limiting each cluster to 100 databases with a separate database for each customer.

Without more detail it's hard to know which configuration will be best for your situation, except to say that if you want to allow users direct access to the database, without going through your software, you need to think about what is visible in system tables with each option. Look at pg_database, pg_user, and pg_class from a user perspective, to see what is exposed.

like image 112
kgrittn Avatar answered Sep 20 '22 18:09

kgrittn