I'm searching for any NoSQL
system (preferably open source) that supports analytic functions (AF
for short) like Oracle/SQL Server/Postgres does. I didn't find any with build-in functions. I've read something about Hive
but it doesn't have actual feature of AF
(windows, first_last values, ntiles, lag, lead and so on) just histograms and ngrams. Also some NoSQL systems (Redis
for example) support map/reduce, but I'm not sure if AF
can be replaced with it.
I want to make a performance comparison to choose either Postgres or NoSQL system.
So, in short:
NoSQL
systems with AF
AF
? Is it fast, reliable, easy to go.ps. I tried to make my question more constructive.
5 – Use a NoSQL Data Store Optimized for Analytics The main benefit of this kind of approach is that there's no need to transform data into a relational structure. Additionally, Elasticsearch leverages its indexing to provide the fast analytics that modern data applications require.
MongoDB provides the tools and APIs that help them build sophisticated analytics queries. Along with analytics-optimized indexing and storage formats, insights and actions are delivered at low latency with high concurrency.
So, for beginners, starting with SQL and then moving to NoSQL might be the best choice. As a rule of thumb, SQL is a better choice if you're dealing with an RDBMS (relational database management system) and want to analyze the data's behavior or want to build custom dashboards.
Once you've really understood how MapReduce works, you can do amazing things with a few lines of code.
Here is a nice video course:
http://code.google.com/intl/fr/edu/submissions/mapreduce-minilecture/listing.html
The real difficulty factor will be between functions that you can implement with a single MapReduce and those that will need chained MapReduces. Moreover, some nice MapReduce implementations (like CouchDB) don't allow you to chain MapReduces (easily).
Some function uses knowledge of all existing data when it involves some king of aggregation (avg, median, standard deviation) or some ordering (first, last).
If you want a distributed NOSQL solution that support AF out of the box, the system will need to rely on some centralized indexing and metadata to keep information about the data in all nodes, thus having a master-node and probably a single point of failure.
You have to ask what you expect to accomplish using NoSQL. You want schemaless tables ? Distributed data ? Better raw performance for very simple queries ?
Depending of your needs, I see three main alternatives here:
1 - use a distributed NoSQL with no single point of failure (ie: Cassandra) to store your data and use map/reduce to process the data and produce the results for the desired function (almost any major NoSQL solution support Hadoop). The caveat is that map/reduce queries are not realtime (can take minutes or hours to execute the query) and requires extra-setup and learning.
2 - use a traditional RDBMS that support multiple servers like MySQL Cluster
3 - use a NoSQL with master/slave topology that supports ad-hoc and aggregation queries like Mongo
As for the second question: yes, you can rely on M/R to replace AF. You can do almost anything with M/R.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With