I see that AWS Elastic MapReduce and AWS Redshift both use a cluster structure and can be used for data analysis. What are the different use cases for them?
Amazon Redshift supports client connections with many types of applications, including business intelligence (BI), reporting, data, and analytics tools.
Amazon Elastic MapReduce (Amazon EMR) is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data.
You are correct that both Amazon EMR and Amazon Redshift are clustered systems that can scale-out to offer more computing power. However, there are some very distinct differences between the two services.
Amazon EMR provides Apache Hadoop and applications that run on Hadoop. It is a very flexible system that can read and process unstructured data and is typically used for processing Big Data. However, learning Hadoop and related technologies can be quite difficult. ("With great power comes great responsibility!")
Amazon Redshift is a petabyte-scale data warehouse that is accessed via SQL. Data must be loaded into Redshift before being queried, which often requires some for of transformation ("ETL").
So which one to choose?
If Amazon Redshift can fit your needs, then use it rather than Hadoop. Redshift is simpler to use because it presents itself as a standard SQL database that you can get going in a few minutes. All the cluster stuff is behind-the-scenes and you don't have to know much to use it.
If you need more flexible capabilities and you don't mind getting low-level and technical, then Hadoop on Amazon EMR will offer you more capabilities.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With