Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is Google's Dremel? How is it different from Mapreduce?

Google's Dremel is described here. What's the difference between Dremel and Mapreduce?

like image 518
Yktula Avatar asked Jul 07 '11 08:07

Yktula


People also ask

What is Dremel technology?

Dremel is a scalable, interactive ad-hoc query system for analysis of read-only nested data. By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds.

What is Dremel BigQuery?

Dremel is the query engine used in Google's BigQuery service. Dremel is the inspiration for Apache Drill, Apache Impala, and Dremio, an Apache licensed platform that includes a distributed SQL execution engine.

What is MapReduce used for?

MapReduce serves two essential functions: it filters and parcels out work to various nodes within the cluster or map, a function sometimes referred to as the mapper, and it organizes and reduces the results from each node into a cohesive answer to a query, referred to as the reducer.

How does MapReduce work?

MapReduce facilitates concurrent processing by splitting petabytes of data into smaller chunks, and processing them in parallel on Hadoop commodity servers. In the end, it aggregates all the data from multiple servers to return a consolidated output back to the application.


2 Answers

Dremel and MapReduce are not directly comparable, but rather they are complementary technologies.

MapReduce is not specifically designed for analyzing data - rather it's a software framework that allows a collection of nodes to tackle distributed computational problems for large datasets.

Dremel is a data analysis tool designed to quickly run queries on massive, structured datasets (such as log or event files). It supports a SQL-like syntax, but apart from table appends, it is read-only. It doesn't support update or create functions, nor does it feature table indexes. Data is organized in a "columnar" format, which contributes to very fast query speed. Google's BigQuery product is an implementation of Dremel accessible via RESTful API.

Hadoop (an open source implementation of MapReduce) in conjunction with the "Hive" data warehouse software, also allows data analysis for massive datasets using a SQL-style syntax. Hive essentially turns queries into MapReduce functions. In contrast to using a ColumIO format, Hive attempts to make queries quick by using techniques such as table indexing.

like image 73
Michael Manoochehri Avatar answered Oct 18 '22 18:10

Michael Manoochehri


Check this article out. Dremel is the what the future of hive should (and will) be.

The major issue of MapReduce and solutions on top of it, like Pig, Hive etc, is that they have an inherent latency between running the job and getting the answer. Dremel uses a totally novel approach (came out in 2010 in that paper by google) which...

...uses a novel query execution engine based on aggregator trees...

...to run almost realtime , interactive AND adhoc queries both of which MapReduce cannot. And Pig and Hive aren't real time

You should keep an eye on projects coming out of this. Is is pretty new for me too... so any other expert comments are welcome!

Edit: Dremel is what the future of HIVE (and not MapReduce as I mentioned before) should be. Hive right now provides a SQL like interface to run MapReduce jobs. Hive has very high latency, and so is not practical in ad-hoc data analysis. Dremel provides a very fast SQL like interface to the data by using a different technique than MapReduce.

like image 22
Jai Avatar answered Oct 18 '22 18:10

Jai