I have a database in postgresql for a software as service with hundreds of customers, currently have a schema of postgresql for each customer, but i like a best solution because the customers rapidly increase. I read about cassandra but i don't wanna lose the integrity of primary,foregin keys and checks. Also read about postgresql in distributed systems, but i dont know what is the best way for implement this currently
Based on this definition, we can conclude that EDB Postgres Distributed enables Postgres to be a distributed database management system. If we treat and use EDB Postgres Distributed in that manner, we stand to gain much from the unique qualities of distributed systems, which go beyond extremely high availability.
Data export with pg_dump The idea behind this dump method is to generate a file with SQL commands that, when fed back to the server, will recreate the database in the same state as it was at the time of the dump. PostgreSQL provides the utility program pg_dump for this purpose.
PostgreSQL Data Warehouse leverages OLTP and OLAP to manage streamlined communications between databases. For example, it's easier to store the data and communicate with databases using OLTP using OLAP. These features make PostgreSQL an organization's favorite for OLAP as a data warehouse.
i don't wanna lose the integrity of primary,foregin keys and checks
The point of systems like Cassandra is, once your dataset or workload doesn't fit on a single machine, you have to give up those things even if you stay on postgresql. (I covered the details in a talk that I highly recommend: http://blip.tv/pycon-us-videos-2009-2010-2011/pycon-2010-what-every-developer-should-know-about-database-scalability-21-3280648).
So Cassandra is an answer to the question, "If we know we're going to have to give up foreign keys and joins, what can we build by rethinking how we design our database?"
If you never get to that point, Cassandra is overkill. (But you should still watch that talk. :)
There are four levels at which you can separate your customers:
Run a separate PostgreSQL cluster for each customer. This provides maximum separation; each client is on a separate port with its own set of system tables, transaction log, etc.
Put each customer in a separate database in the same cluster. This way they each have a separate login, but on the same port number, and they share global tables like pg_database.
Give each customer a separate schema in the same database. This doesn't require separate user IDs if they are only connecting through your software, because you can just set the search_path. Of course you can use separate user IDs if you prefer.
Make customer_id part of the primary key of each table, and be sure to limit by that in your software. This is likely to scale better than having duplicate tables for each of hundreds of users, but you must be very careful to always qualify your queries by customer_id.
Some people have been known to combine these techniques, for example, limiting each cluster to 100 databases with a separate database for each customer.
Without more detail it's hard to know which configuration will be best for your situation, except to say that if you want to allow users direct access to the database, without going through your software, you need to think about what is visible in system tables with each option. Look at pg_database, pg_user, and pg_class from a user perspective, to see what is exposed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With