Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NoSQL with analytic functions

I'm searching for any NoSQL system (preferably open source) that supports analytic functions (AF for short) like Oracle/SQL Server/Postgres does. I didn't find any with build-in functions. I've read something about Hive but it doesn't have actual feature of AF (windows, first_last values, ntiles, lag, lead and so on) just histograms and ngrams. Also some NoSQL systems (Redis for example) support map/reduce, but I'm not sure if AF can be replaced with it.

I want to make a performance comparison to choose either Postgres or NoSQL system.

So, in short:

  1. Searching for NoSQL systems with AF
  2. Can I rely on map/reduce to replace AF? Is it fast, reliable, easy to go.

ps. I tried to make my question more constructive.

like image 821
ravnur Avatar asked Oct 31 '12 11:10

ravnur


People also ask

Is NoSQL good for analytics?

5 – Use a NoSQL Data Store Optimized for Analytics The main benefit of this kind of approach is that there's no need to transform data into a relational structure. Additionally, Elasticsearch leverages its indexing to provide the fast analytics that modern data applications require.

Can MongoDB be used for analytics?

MongoDB provides the tools and APIs that help them build sophisticated analytics queries. Along with analytics-optimized indexing and storage formats, insights and actions are delivered at low latency with high concurrency.

Is SQL or NoSQL better for analytics?

So, for beginners, starting with SQL and then moving to NoSQL might be the best choice. As a rule of thumb, SQL is a better choice if you're dealing with an RDBMS (relational database management system) and want to analyze the data's behavior or want to build custom dashboards.


2 Answers

Once you've really understood how MapReduce works, you can do amazing things with a few lines of code.

Here is a nice video course:

http://code.google.com/intl/fr/edu/submissions/mapreduce-minilecture/listing.html

The real difficulty factor will be between functions that you can implement with a single MapReduce and those that will need chained MapReduces. Moreover, some nice MapReduce implementations (like CouchDB) don't allow you to chain MapReduces (easily).

like image 96
Aurélien Avatar answered Oct 23 '22 22:10

Aurélien


Some function uses knowledge of all existing data when it involves some king of aggregation (avg, median, standard deviation) or some ordering (first, last).

If you want a distributed NOSQL solution that support AF out of the box, the system will need to rely on some centralized indexing and metadata to keep information about the data in all nodes, thus having a master-node and probably a single point of failure.

You have to ask what you expect to accomplish using NoSQL. You want schemaless tables ? Distributed data ? Better raw performance for very simple queries ?

Depending of your needs, I see three main alternatives here:

1 - use a distributed NoSQL with no single point of failure (ie: Cassandra) to store your data and use map/reduce to process the data and produce the results for the desired function (almost any major NoSQL solution support Hadoop). The caveat is that map/reduce queries are not realtime (can take minutes or hours to execute the query) and requires extra-setup and learning.

2 - use a traditional RDBMS that support multiple servers like MySQL Cluster

3 - use a NoSQL with master/slave topology that supports ad-hoc and aggregation queries like Mongo

As for the second question: yes, you can rely on M/R to replace AF. You can do almost anything with M/R.

like image 24
lstern Avatar answered Oct 23 '22 21:10

lstern