We are designing an Big data solution for one of our dashboard applications and seriously considering Glue for our initial ETL. Currently Glue supports JDBC and S3 as the target but our downstream services and components will work better with dynamodb. We are wondering what is the best approach to eventually move the records from Glue to Dynamo.
Should we write to S3 first and then run lambdas to insert the data into Dynamo? Is that the best practice? OR Should we use a third party JDBC wrapper for Dynamodb and use Glue to directly write to Dynamo (Not sure if this is possible, sounds a bit scary) OR Should we do something else?
Any help is greatly appreciated. Thanks!
You can now crawl your Amazon DynamoDB tables, extract associated metadata, and add it to the AWS Glue Data Catalog.
AWS Glue supports writing data into another AWS account's DynamoDB table.
You can add the following lines to your Glue ETL script:
glueContext.write_dynamic_frame.from_options(frame =DynamicFrame.fromDF(df, glueContext, "final_df"), connection_type = "dynamodb", connection_options = {"tableName": "pceg_ae_test"})
df should be of type DynamicFrame
I am able to write using boto3... definitly its not best approach to load but its working one. :)
dynamodb = boto3.resource('dynamodb','us-east-1') table =
dynamodb.Table('BULK_DELIVERY')
print "Start testing"
for row in df1.rdd.collect():
var1=row.sourceCid
print(var1) table.put_item( Item={'SOURCECID': "{}".format(var1)} )
print "End testing"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With