I am currently working in a few projects with MongoDB and Apache Cassandra respectively. I am also using Solr a lot and I am handling "lots" of data with them (approx. 1-2TB). I've heard of Greenplum and Vertica the first time in the last week and I am not really sure, where to put them in my brain. They seem to me like Dataware House (DWH) solutions and I haven't really worked DWH. And they seem to cost lots of money (e.g. $60k for 1TB storage in Greenplum). I am currently not handling Petabyte of data and won't do so I think, but products like cassandra seem also to be able to handle this
Cassandra is the acknowledged NoSQL leader when it comes to comfortably scaling to terabytes or petabytes of data.
via http://www.datastax.com/why-cassandra
So my question: Why should people use Greenplum & Co? Is there a huge advantage in comparison to these other products?
Thanks.
I work in the telecom industry. We deal with large data-sets and complex EDW(enterprise data warehouse) models.We started with Teradata and it was good for few years. Then the data increased exponentially, and as you know expansion in Teradata is expensive. So, we evaluated EMCs namely green plum, oracle exadata, hp Vertica and IBM netteza.
In speed, generation of 20 reports went like this: 1. Vertica, 2. Netteza, 3. green plum, 4. oracle
In compression ratio: Vertica had a natural advantage. Among others IBM is good too. The worst as per the benchmarks is emc and oracle. As always expected as its both want to sell ton of storage and hardware.
Scalability: All do scale well.
Loading time: emc is the best here, others (teradata , Vertica, oracle , IBM) are good too.
Concurrent user query :Vertica, emc, green plum, then only IBM. Oracle exadata is slow in any type of query case comparatively but much better than its old school 10g.
Price: Teradata > Oracle > IBM > HP > EMC
Note: Need to compare apple to apple, same no of cores ,ram,data volume, and reports
We chose Vertica for hardware independent pricing model, lower pricing and good performance. Now all 40+ users are happy to generate reports without waiting and it all fit in the low cost hp dl380 servers. it is great for olap /edw use case.
All this analysis is only for edw/analytics/olap case. I am still an oracle fan boy for all oltp, rich plsql, connectivity etc on any hardware or system. Exadata gives a decent mixed workload, but unreasonable in Price/performance ratio and still need to migrate 10g code to exadata best practice (sort of MMP like, bulk processing etc, and its time consuming than what they claim.
We've been working in Hadoop for 4 years, and Vertica for 2. We had massive loading and indexing problems with our tables in MySQL. We were running on fumes with our home-grown sharding solution. We could have invested heavily in developing a more sophisticated sharding solution, which would have been quite painful, imo. We could have thought harder about what data we absolutely needed to keep in a SQL database.
But at the end of the day, switching from MySQL to Vertica was what we chose. Vertica performance patterns are quite different from MySQL's, which comes with its own headaches. But it can load a lot of data very quickly, and it is good at heavy duty queries that would make MySQL's head spin.
The way I see it, Vertica is a solution when you are already invested in SQL and need a heavier duty SQL database. I'm not an expert, so I couldn't tell you what a transition to Oracle or DB2 would have been like compared to Vertica, neither in terms of integration effort or monetary cost.
Vertica offers a lot of features we've barely looked into. Those might be very attractive to others with use cases different to ours.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With