Writing From Spark to DynamoDB

Question

I'm trying to use spark to grab a file from amazon s3 (in the form of a DataFrame or an RDD), do some simple transformations, and then send the file to a table on DynamoDB.

After reading a few other forum posts, I've come to understand that reading/writing to DynamoDB requires using a hadoopRDD - which is different than a RDD in spark - and different from the one in which I am retrieving my s3 file.

How would I go about changing a DataFrame/RDD from a file in s3 to a hadoopRDD so I can send it back up?

I'm using scala and testing out everything in spark-shell.

Thanks again in advance!

Ivan Mushketyk · Accepted Answer

You can use EMR DynamoDB Connector implemented by Amazon. It implements both DynamoDBInputFormat and DynamoDBOutputFormat which allows to read and write data from and to DynamoDB.

You can read more about this in this blog post.

Writing From Spark to DynamoDB

Tags:

amazon-s3

scala

apache-spark

amazon-dynamodb

Willks

1 Answers

Ivan Mushketyk

Recent Activity

Donate For Us

Writing From Spark to DynamoDB

Tags:

amazon-s3

scala

apache-spark

amazon-dynamodb

Willks

1 Answers

Ivan Mushketyk

Related questions

Recent Activity

Donate For Us