I'm trying to use spark to grab a file from amazon s3 (in the form of a DataFrame or an RDD), do some simple transformations, and then send the file to a table on DynamoDB.
After reading a few other forum posts, I've come to understand that reading/writing to DynamoDB requires using a hadoopRDD - which is different than a RDD in spark - and different from the one in which I am retrieving my s3 file.
How would I go about changing a DataFrame/RDD from a file in s3 to a hadoopRDD so I can send it back up?
I'm using scala and testing out everything in spark-shell.
Thanks again in advance!
You can use EMR DynamoDB Connector implemented by Amazon. It implements both DynamoDBInputFormat and DynamoDBOutputFormat which allows to read and write data from and to DynamoDB.
You can read more about this in this blog post.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With