What are the arguments for and against using Greenplum
instead of PostgreSQL
in a webapp (django
) environment?
My gut reaction is to prefer PostgreSQL's open-source approach and huge knowledgebase.
My configuration (though I'd love to hear about any other configuration) is a medium-sized business with 2 web servers and (at the moment) 2 database servers.
Areas to contrast are binary data crunching
, number of nodes in the replication
and my personal favorite: communitiy support
and skilled engineer support.
What are the pros and cons of using Greenplum instead of PostgreSQL?
I don't know much about Greenplum, except for quickly skimming the link you send. A data warehouse is not the same thing as a transactional operational data store. The former is for ad hoc queries, statistical analysis, dimensional analysis, read-mostly access to historical data. The latter is for real-time, read/write of operational data. They're complimentary.
I'm guessing that you want PostgreSQL.
Who is pushing Greenplum on you and why? If it's being presented as an alternative, I'd dig deeper and rebut the argument.
Greenplum is an MPP analytical (OLAP) DBMS. PostgreSQL is an OLTP DBMS. And in general, there is not a single solution on the market that can be good at both OLAP and OLTP at the same time, you can find my thoughts on it here
The WebApp backend will always create OLTP workload. Greenplum has a big overhead for transaction processing as it is a distributed system, so don't expect this to deliver you more than 500-600 TPS. Postgres in contrast can go to hundreds of thousands of TPS with the right tuning.
In contrast, when you need a OLAP workload, Postgres can offer you only a single host processing, no partitioning with dynamic partition elimination, no compression, no columnar store. While Greenplum would be able to crunch your data in parallel on the cluster.
So the solution you are looking for is a typical data warehouse case - use OLTP solution for high transactional workload, extract the data to the DWH with ETL/ELT, and then run complex data crunching queries on it
At the moment both PostgreSQL and Greenplum are open source products, so you are free to chose any of them, but of cause PostgreSQL community is bigger ATM
Greenplum is an MPP adaption of PostgreSQL. It's optimized for warehousing and/or analytics on large sets of data and would not perform that well in a transactional environment. If you need a large DW environment, look at Greenplum. If you need OLTP or smaller DB sizes (under 10TB) then look at PostgreSQL.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With