Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting emr-ddb-hadoop.jar to connect DynamoDB with EMR Spark

I have a DynamoDB table that I need to connect to EMR Spark SQL to run queries on the table. I got the EMR Spark Cluster with release label emr-4.6.0 and Spark 1.6.1 on it.

I am referring to the document: Analyse DynamoDB Data with Spark

After connecting to the master node, I run the command:

spark-shell --jars /usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar

It gives a warning:

Warning: Local jar /usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar does not exist, skipping.

Later, when I import the DynamoDB Input Format using

import org.apache.hadoop.dynamodb.read.DynamoDBInputFormat import org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat

It gives the error:

 error: object dynamodb is not a member of package org.apache.hadoop
     import org.apache.hadoop.dynamodb.read.DynamoDBInputFormat
 error: object dynamodb is not a member of package org.apache.hadoop
     import org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat

I think it is the jar that is causing this error. Where do I get this emr-ddb-hadoop.jar?

like image 699
Shweta Avatar asked May 05 '16 21:05

Shweta


2 Answers

don't use spark-shell --jars, configuration in spark-default.cnf:

spark.driver.extraClassPath  /usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar
spark.executor.extraClassPath /usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar

Later, import the DynamoDB Input Format is OK

import org.apache.hadoop.dynamodb.read.DynamoDBInputFormat
import org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat
like image 132
Joe Chien Avatar answered Sep 29 '22 08:09

Joe Chien


The root cause of this problem is that emr-ddb-hadoop.jar is not available in the environment (or the location specified). In oder to install the dynamo DB libraries you have to select Hadoop 2.7.2 along with your applications of interest when you are creating the spark EMR cluster. Did you select that ?

If not launch a new cluster, go to advanced options and make sure Hadoop 2.7.2 is selected along with other applications.

like image 22
user479151 Avatar answered Sep 29 '22 06:09

user479151