Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Am I fully utilizing my EMR cluster?

  • Total Instances: I have created an EMR with 11 nodes total (1 master instance, 10 core instances).
  • job submission: spark-submit myApplication.py

enter image description here

  • graph of containers: Next, I've got these graphs, which refer to "containers" and I'm not entirely what containers are in the context of EMR, so this isn't obvious what its telling me:

enter image description here

  • actual running executors: and then I've got this in my spark history UI, which shows that I only have 4 executors ever got created.
  • Dynamic Allocation: Then I've got spark.dynamicAllocation.enabled=True and I can see that in my environment details.
  • Executor memory: Also, the default executor memory is at 5120M.

  • Executors: Next, I've got my executors tab, showing that I've got what looks like 3 active and 1 dead executor: enter image description here

So, at face value, it appears to me that I'm not using all my nodes or available memory.

  1. how do I know if i'm using all the resources I have available?
  2. if I'm not using all available resources to their full potential, how do I change what I'm doing so that the available resources are being used to their full potential?
like image 859
Kristian Avatar asked Jan 22 '17 01:01

Kristian


1 Answers

Another way to go to see how many resources are being used by each of the nodes of the cluster is to use the web tool of Ganglia.

This is published on the master node and will show a graph of each node's resource usage. The issue will be if you have not enable Ganglia at the time of cluster creation as one of the tools available on the EMR cluster.

Once enable however you can go to the web page and see how much each node is being utilized.

like image 176
pquery Avatar answered Sep 27 '22 16:09

pquery