Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWS EMR with only master & Task nodes

Tags:

emr

Is that possible to build a AWS EMR with a master node and set of task(slave) nodes (with out core nodes),when I am sure that source data is in S3 and processed result is going to be stored in S3.

Basically, the question is "what is the need of having Datanode process when EMR is going to process the data in S3 " ( where we do not store and use anything in HDFS).

like image 748
Vijayanand Avatar asked Jul 20 '15 20:07

Vijayanand


People also ask

What are the limitations of EMR cluster with multiple master nodes?

Limitations of an EMR cluster with multiple master nodes: If any two master nodes fail simultaneously, Amazon EMR cannot recover the cluster. Amazon EMR clusters with multiple master nodes are not tolerant to Availability Zone failures.

Is AWS EMR fully managed?

It is a fully managed application with single sign-on, fully managed Jupyter Notebooks, automated infrastructure provisioning, and the ability to debug jobs without logging into the AWS Console or cluster.

Is AWS EMR available in free tier?

EMR is not one of the services offered in free tier. If you are just learning how spark works you don't need an EMR cluster. You can play around on a t2. micro.


1 Answers

Core nodes in EMR provide compute resources as well as HDFS. In Hadoop 2.x this would be provided by YARN NodeManager. Even if an application's input and output are both on S3, YARN (and often other app layers like Hive) utilizes HDFS to stage jars, split info, session data, etc.

like image 145
ChristopherB Avatar answered Sep 30 '22 04:09

ChristopherB