Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between AWS Elastic MapReduce and AWS Redshift

I see that AWS Elastic MapReduce and AWS Redshift both use a cluster structure and can be used for data analysis. What are the different use cases for them?

Amazon Redshift supports client connections with many types of applications, including business intelligence (BI), reporting, data, and analytics tools.

Amazon Elastic MapReduce (Amazon EMR) is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data.

like image 626
Cenxui Avatar asked Jun 04 '16 06:06

Cenxui


1 Answers

You are correct that both Amazon EMR and Amazon Redshift are clustered systems that can scale-out to offer more computing power. However, there are some very distinct differences between the two services.

Amazon EMR provides Apache Hadoop and applications that run on Hadoop. It is a very flexible system that can read and process unstructured data and is typically used for processing Big Data. However, learning Hadoop and related technologies can be quite difficult. ("With great power comes great responsibility!")

Amazon Redshift is a petabyte-scale data warehouse that is accessed via SQL. Data must be loaded into Redshift before being queried, which often requires some for of transformation ("ETL").

So which one to choose?

  • If you want to use SQL and you have structured data (eg CSV files), then Redshift is the simplest solution.
  • If you want to process unstructured data (eg in strange formats rather than structured CSV files), Amazon EMR can provide a Hadoop system that is very capable.
  • Sometimes people use both -- use Hadoop to transform data, then use Redshift for querying the data.

If Amazon Redshift can fit your needs, then use it rather than Hadoop. Redshift is simpler to use because it presents itself as a standard SQL database that you can get going in a few minutes. All the cluster stuff is behind-the-scenes and you don't have to know much to use it.

If you need more flexible capabilities and you don't mind getting low-level and technical, then Hadoop on Amazon EMR will offer you more capabilities.

like image 186
John Rotenstein Avatar answered Nov 16 '22 01:11

John Rotenstein