I have some expirience with Apache Spark and Spark-SQL. Recently I've found Apache Drill project. Could you describe me what are the most significant advantages/differences between them? I've already read Fast Hadoop Analytics (Cloudera Impala vs Spark/Shark vs Apache Drill) but this topic is still unclear for me.
Some of the drawbacks of Apache Spark are there is no support for real-time processing, Problem with small file, no dedicated File management system, Expensive and much more due to these limitations of Apache Spark, industries have started shifting to Apache Flink– 4G of Big Data.
Apache Drill enables analysts, business users, data scientists and developers to explore and analyze this data without sacrificing the flexibility and agility offered by these datastores. Drill processes the data in-situ without requiring users to define schemas or transform data.
Hadoop, Splunk, Cassandra, Apache Beam, and Apache Flume are the most popular alternatives and competitors to Apache Spark.
Here's an article I came across that discusses some of the SQL technologies: http://www.zdnet.com/article/sql-and-hadoop-its-complicated/
Drill is fundamentally different in both the user's experience and the architecture. For example:
Drill 1.0 was just released (May 19, 2015). You can easily download it onto your laptop and play with it without any infrastructure (Hadoop, NoSQL, etc.).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With