Getting emr-ddb-hadoop.jar to connect DynamoDB with EMR Spark

Question

I have a DynamoDB table that I need to connect to EMR Spark SQL to run queries on the table. I got the EMR Spark Cluster with release label emr-4.6.0 and Spark 1.6.1 on it.

I am referring to the document: Analyse DynamoDB Data with Spark

After connecting to the master node, I run the command:

spark-shell --jars /usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar

It gives a warning:

Warning: Local jar /usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar does not exist, skipping.

Later, when I import the DynamoDB Input Format using

import org.apache.hadoop.dynamodb.read.DynamoDBInputFormat import org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat

It gives the error:

 error: object dynamodb is not a member of package org.apache.hadoop
     import org.apache.hadoop.dynamodb.read.DynamoDBInputFormat
 error: object dynamodb is not a member of package org.apache.hadoop
     import org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat

I think it is the jar that is causing this error. Where do I get this emr-ddb-hadoop.jar?

Joe Chien · Accepted Answer

don't use spark-shell --jars, configuration in spark-default.cnf:

spark.driver.extraClassPath  /usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar
spark.executor.extraClassPath /usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar

Later, import the DynamoDB Input Format is OK

import org.apache.hadoop.dynamodb.read.DynamoDBInputFormat
import org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat

user479151 · Answer

The root cause of this problem is that emr-ddb-hadoop.jar is not available in the environment (or the location specified). In oder to install the dynamo DB libraries you have to select Hadoop 2.7.2 along with your applications of interest when you are creating the spark EMR cluster. Did you select that ?

If not launch a new cluster, go to advanced options and make sure Hadoop 2.7.2 is selected along with other applications.

Getting emr-ddb-hadoop.jar to connect DynamoDB with EMR Spark

Tags:

amazon-web-services

apache-spark

hadoop

amazon-dynamodb

Shweta

2 Answers

Joe Chien

user479151

Recent Activity

Donate For Us

Getting emr-ddb-hadoop.jar to connect DynamoDB with EMR Spark

Tags:

amazon-web-services

apache-spark

hadoop

amazon-dynamodb

Shweta

2 Answers

Joe Chien

user479151

Related questions

Recent Activity

Donate For Us