Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Amazon EC2 vs. Amazon EMR [closed]

Tags:

I have implemented a task in Hive. Currently it is working fine on my single node cluster. Now I am planning to deploy it on AWS.

I don't know anything about the AWS. If I plan to deploy it then what should I choose Amazon EC2 or Amazon EMR?

I want to improve the performance of my task. Which one is better and reliable for me? How to approach towards them? I heard that we can also register our VM setting as it is on AWS. Is it possible?

Please suggest me as soon as possible.

Many Thanks.

like image 917
Bhavesh Shah Avatar asked Apr 11 '12 05:04

Bhavesh Shah


People also ask

What is the difference between EC2 and EMR in AWS?

Amazon EC2 is a cloud based service which gives customers access to a varying range of compute instances, or virtual machines. Amazon EMR is a managed big data service which provides pre-configured compute clusters of Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi, and Presto.

Does EMR use EC2?

You can deploy your workloads to EMR using Amazon EC2, Amazon Elastic Kubernetes Service (EKS), or on-premises AWS Outposts. You can run and manage your workloads withthe EMR Console, API, SDK or CLI and orchestrate them using Amazon Managed Workflows for Apache Airflow (MWAA) or AWS Step Functions.

How is Amazon EMR different from traditional database?

Amazon EMR is designed to reduce the cost of processing large amounts of data. Some of the features that make it low cost include low per-second pricing, Amazon EC2 Spot integration, Amazon EC2 Reserved Instance integration, elasticity, and Amazon S3 integration.

What is the difference between AWS EC2 and ECS?

EC2 allows you to launch individual instances which you can use for pretty much whatever you like. ECS is a container service, which means it will launch instances that will be ready to launch container applications.


2 Answers

EMR is a collection of EC2 instances with Hadoop (and optionally Hive and/or Pig) installed and configured on them. If you are using your cluster for running Hadoop/Hive/Pig jobs, EMR is the way to go. An EMR instance costs a little bit extra as compared to an EC2 instance. A quick check on Amazon prices today reveals that a small EC2 instances costs $0.08/hour while a small EMR instance costs $0.015/hour extra. In my opinion, it's totally worth paying that extra money to save yourself the hassle of installing and setting up Hadoop (along with Hive and Pig), creating and maintaining and AMI and using it. Moreover, EMR's version of Hadoop and Hive has some patches that are not available (atleast, not yet) on Apache Hive. If you use EC2, you will probably be using Apache Hadoop and Hive (or may be, the cloudera distributions) and wouldn't have access to those patches (like native support for S3 or commands like ALTER TABLE my_table RECOVER PARTITIONS

References:

  • http://aws.amazon.com/ec2/pricing/
  • http://aws.amazon.com/elasticmapreduce/pricing/
like image 178
Mark Grover Avatar answered Sep 20 '22 01:09

Mark Grover


I would suggest that you do NOT try and deploy your own Hadoop cluster, unless you have 2-3 months to spare, and you have a hadoop expert handy.

Elastic MapReduce will allow you to get started very quickly by providing a pre-configured hadoop environment. Seeing as you only have a single job, it should be fine.

like image 36
Matthew Rathbone Avatar answered Sep 23 '22 01:09

Matthew Rathbone