We have a cluster (hadoop, pig) which churns data 350Gb (growing couple of GB a week). All these data need to be made available for Analytics. We have a Msyql solution with star schema(only parts of data is loaded on to this). But concern is how far one can stretch this ? Should I be looking at NoSQL like Hive for data analytics ?? I read this article http://anders.com/cms/282/Distributed.Data/Hadoop/Hbase/Hive How big is big Data, and when should I be looking away from MySQL? Will the structural rigidness of Mysql cause problems ? Currently the data is only few GB(in MySQL), But it certainly will grow. How about MySQL clustering ?? Should I be going down this path at all ??

You switch when you start having the kinds of problems outlined in something like this comparative question: https://dba.stackexchange.com/questions/5/what-are-the-differences-between-nosql-and-a-traditional-rdbms Other than that, it's a little difficult to answer the question beyond general advice, because you don't pose a specific problem that you are trying to solve (e.g. scaling, read speed, the problems with requiring 100% consistency, etc.).

NoSql or MySQL for Data Analytics

4 Answers

350Gb (growing couple of GB a week)... All these data need to be made available for Analytics

Do you have MySQL gurus in house? If yes, sure => just create and grow that MySQL cluster. The only problem with this solution is not that it is MySQL, and it is not that it is not a NoSQL => it is literally because it requires an expert to set it up and always be there by your side in case it needs to be changed. But guess what => SQL is MUCH better and simpler for analytics, than a map/reduc'ish SQL simulation.

Something that can become a problem later with MySQL solution is Oracle. So make sure you understand what features of MySQL you can use for free, and what features you would have to pay for.

If you do not have a MySQL expert in house, or you would not like to pay for one, you can definitely turn to NoSQL. It does not mean that you would not need a NoSQL product expertise though, but to configure and run X nodes as a single system is an extremely simple and natural process for NoSQL solutions.

For example, in Riak, and a couple of other NoSQL beasts, most of the distribution complexities are solved by the product without you needing to do anything at all => it really is that simple.

The price you pay with NoSQL is losing SQL (think about nice aggregating features) and consistency, which is eventual, and if you strictly doing analytics, for you, consistency may not be a price at all.

In return you get a very natural Big Data handling, fault tolerance and much more.

If you are in Hadooooxyz space, and you are okay to pay, take a look at Hadapt, which promises 5 times Hive performance.

answered Oct 21 '22 20:10

tolitius

The question is of course now many months old, but... I recently came across InfiniDB, which puts a MySQL front end on a highly scalable, MapReduce-based Big Data engine aimed specifically at analytics. It may be a solution for this problem-- in principle it should drop in and require very little administration and few code changes. Scaling up on one box or out on multiple servers is supported...

answered Oct 21 '22 19:10

drive-by poster

You switch when you start having the kinds of problems outlined in something like this comparative question: https://dba.stackexchange.com/questions/5/what-are-the-differences-between-nosql-and-a-traditional-rdbms

Other than that, it's a little difficult to answer the question beyond general advice, because you don't pose a specific problem that you are trying to solve (e.g. scaling, read speed, the problems with requiring 100% consistency, etc.).

answered Oct 21 '22 21:10

jefflunt

InfiniDB is not free.

Check out http://code.google.com/p/shard-query

This is like Map-Reduce over a sharded shared-nothing set of databases. Works great for STAR schemas. Shard the fact table over N nodes and duplicate the dimension tables on each server.

You can check out this blog post for more info and performance testing results:

http://www.mysqlperformanceblog.com/2011/05/06/scale-out-mysql/

FYI: I'm the author of Shard-Query.

answered Oct 21 '22 19:10

Justin Swanhart

Related questions
                            
                                MySQL specifying exact order with WHERE `id` IN (...)
                            
                                SELECT UNION and ORDER BY in mysql.. how to?
                            
                                How to Keep a MySQL Connection Open in Bash
                            
                                CodeIgniter/PHP Active Record won't increment an integer
                            
                                Recursive stored functions in MySQL
                            
                                GPS radius search with Php 5 and MySQL
                            
                                Removing all records from a table that don't exist in another table
                            
                                How to Export SQL Query to TXT Using Command Line
                            
                                Two-key encryption/decryption?
                            
                                Drop multiple databases using mysql command
                            
                                PHP: execute script 5 minutes after first script is run
                            
                                SQL - Give me 3 hits for each type only
                            
                                Sum on a left join SQL
                            
                                MySQL - Joining two tables without duplicates?
                            
                                Connecting to a MySQL database using Xcode and Objective-C
                            
                                MySQL: Is it possible to 'fill' a SELECT with values without a table?
                            
                                How can I replace NULL category titles in MySQL ROLLUP function?
                            
                                Build insert query from array MySQL and PHP
                            
                                Does it make sense to create new table or add fields
                            
                                Adding a Primary key to a MySQL table, and auto-populating it

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

NoSql or MySQL for Data Analytics

Tags:

mysql

nosql

hive

AlgoMan

People also ask