Running Spark on AWS EMR, how to run driver on master node?

Question

It seems that by default EMR deploys the Spark driver to one of the CORE nodes, resulting in the MASTER node being virtually un-utilized. Is it possible to run the driver program on the MASTER node instead? I have experimented with the --deploy-mode arguments to no avail.

Here is my instance groups JSON definition:

[
  {
    "InstanceGroupType": "MASTER",
    "InstanceCount": 1,
    "InstanceType": "m3.xlarge",
    "Name": "Spark Master"
  },
  {
    "InstanceGroupType": "CORE",
    "InstanceCount": 3,
    "InstanceType": "m3.xlarge",
    "Name": "Spark Executors"
  }
]

Here is my configurations JSON definition:

[
  {
    "Classification": "spark",
    "Properties": {
      "maximizeResourceAllocation": "true"
    },
    "Configurations": []
  },
  {
    "Classification": "spark-env",
    "Properties": {
    },
    "Configurations": [
      {
        "Classification": "export",
        "Properties": {
        },
        "Configurations": [
        ]
      }
    ]
  }
]

Here is my steps JSON definition:

[
  {
    "Name": "example",
    "Type": "SPARK",
    "Args": [
      "--class", "com.name.of.Class",
      "/home/hadoop/myjar-assembly-1.0.jar"
    ],
    "ActionOnFailure": "TERMINATE_CLUSTER"
  }
]

I am using aws emr create-cluster with --release-label emr-4.3.0.

Z.Wei · Accepted Answer

I don't think it is a waste. When running Spark on EMR, the master node will run Yarn RM, Livy Server, and maybe other applications you selected. And if you run in client mode, the majority of the driver program will run on the master node as well.

Note that the driver program could be heavier than the tasks on executors, e.g. collecting all results from all executors, in which case you need to allocate enough resources to your master node if it is where the driver program is running.

Running Spark on AWS EMR, how to run driver on master node?

Tags:

amazon-web-services

apache-spark

emr

Landon Kuhn

1 Answers

Z.Wei

Recent Activity

Donate For Us

Running Spark on AWS EMR, how to run driver on master node?

Tags:

amazon-web-services

apache-spark

emr

Landon Kuhn

1 Answers

Z.Wei

Related questions

Recent Activity

Donate For Us