Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Writing From Spark to DynamoDB

I'm trying to use spark to grab a file from amazon s3 (in the form of a DataFrame or an RDD), do some simple transformations, and then send the file to a table on DynamoDB.

After reading a few other forum posts, I've come to understand that reading/writing to DynamoDB requires using a hadoopRDD - which is different than a RDD in spark - and different from the one in which I am retrieving my s3 file.

How would I go about changing a DataFrame/RDD from a file in s3 to a hadoopRDD so I can send it back up?

I'm using scala and testing out everything in spark-shell.

Thanks again in advance!

like image 200
Willks Avatar asked May 25 '16 18:05

Willks


1 Answers

You can use EMR DynamoDB Connector implemented by Amazon. It implements both DynamoDBInputFormat and DynamoDBOutputFormat which allows to read and write data from and to DynamoDB.

You can read more about this in this blog post.

like image 199
Ivan Mushketyk Avatar answered Nov 15 '22 06:11

Ivan Mushketyk