Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Advantages of databases like Greenplum or Vertica compared to MongoDB or Cassandra [closed]

I am currently working in a few projects with MongoDB and Apache Cassandra respectively. I am also using Solr a lot and I am handling "lots" of data with them (approx. 1-2TB). I've heard of Greenplum and Vertica the first time in the last week and I am not really sure, where to put them in my brain. They seem to me like Dataware House (DWH) solutions and I haven't really worked DWH. And they seem to cost lots of money (e.g. $60k for 1TB storage in Greenplum). I am currently not handling Petabyte of data and won't do so I think, but products like cassandra seem also to be able to handle this

Cassandra is the acknowledged NoSQL leader when it comes to comfortably scaling to terabytes or petabytes of data.

via http://www.datastax.com/why-cassandra

So my question: Why should people use Greenplum & Co? Is there a huge advantage in comparison to these other products?

Thanks.

like image 667
H6. Avatar asked Jan 24 '12 13:01

H6.


2 Answers

I work in the telecom industry. We deal with large data-sets and complex EDW(enterprise data warehouse) models.We started with Teradata and it was good for few years. Then the data increased exponentially, and as you know expansion in Teradata is expensive. So, we evaluated EMCs namely green plum, oracle exadata, hp Vertica and IBM netteza.

In speed, generation of 20 reports went like this: 1. Vertica, 2. Netteza, 3. green plum, 4. oracle

In compression ratio: Vertica had a natural advantage. Among others IBM is good too. The worst as per the benchmarks is emc and oracle. As always expected as its both want to sell ton of storage and hardware.

Scalability: All do scale well.

Loading time: emc is the best here, others (teradata , Vertica, oracle , IBM) are good too.

Concurrent user query :Vertica, emc, green plum, then only IBM. Oracle exadata is slow in any type of query case comparatively but much better than its old school 10g.

Price: Teradata > Oracle > IBM > HP > EMC

Note: Need to compare apple to apple, same no of cores ,ram,data volume, and reports

We chose Vertica for hardware independent pricing model, lower pricing and good performance. Now all 40+ users are happy to generate reports without waiting and it all fit in the low cost hp dl380 servers. it is great for olap /edw use case.

All this analysis is only for edw/analytics/olap case. I am still an oracle fan boy for all oltp, rich plsql, connectivity etc on any hardware or system. Exadata gives a decent mixed workload, but unreasonable in Price/performance ratio and still need to migrate 10g code to exadata best practice (sort of MMP like, bulk processing etc, and its time consuming than what they claim.

like image 187
Arun Avatar answered Sep 28 '22 01:09

Arun


We've been working in Hadoop for 4 years, and Vertica for 2. We had massive loading and indexing problems with our tables in MySQL. We were running on fumes with our home-grown sharding solution. We could have invested heavily in developing a more sophisticated sharding solution, which would have been quite painful, imo. We could have thought harder about what data we absolutely needed to keep in a SQL database.

But at the end of the day, switching from MySQL to Vertica was what we chose. Vertica performance patterns are quite different from MySQL's, which comes with its own headaches. But it can load a lot of data very quickly, and it is good at heavy duty queries that would make MySQL's head spin.

The way I see it, Vertica is a solution when you are already invested in SQL and need a heavier duty SQL database. I'm not an expert, so I couldn't tell you what a transition to Oracle or DB2 would have been like compared to Vertica, neither in terms of integration effort or monetary cost.

Vertica offers a lot of features we've barely looked into. Those might be very attractive to others with use cases different to ours.

like image 33
kimbo305 Avatar answered Sep 28 '22 00:09

kimbo305