Connecting DynamoDB from Spark program to load all items from one table using Python?

1 Answers

You can use parallel scans available as part of the DynamoDB API through boto3 and a scheme like the parallel S3 file processing application written for PySpark described here. Basically, instead of reading all the keys a-priori, just create a list of segment numbers and hard code the max number of segments for scan in the map_func function for Spark.

116

answered Sep 24 '22 01:09

Alexander Patrikalakis

Related questions
                            
                                Putting/Updating item in DynamoDB fails for the UpdateExpression syntax
                            
                                How to set TTL in Java based app for DynamoDB
                            
                                SQS fifo queues not ensuring single time delivery when used as lambda trigger
                            
                                Amazon DAX client throws "No endpoints available" exception
                            
                                How to define DynamoDB table with global secondary index in serverless framework
                            
                                Amazon DynamoDB and AngularJS
                            
                                DynamoDB mapping List of Enum
                            
                                AWS DynamoDB trigger using Lambda in JAVA
                            
                                How do I cure a call to the AWS Java SDK DynamoDB resulting in an ExpiredTokenException?
                            
                                AWS node.js automatic retry on failed batchWrite()
                            
                                AWS Dynamodb: Is it possible to make a query only using sort key
                            
                                boto : ResourceNotFoundException Dynamodb
                            
                                AWS Lambda function write to DynamoDB
                            
                                You do not have the required role to enable Auto Scaling
                            
                                Unable to open (Local DynamoDB) database file after power outage
                            
                                DynamoDBMapper - Failed to instantiate class
                            
                                DynamoDB w/ Serverless, using Fn::GetRef to reference global secondary index
                            
                                Writing to Kinesis stream using AWS Lambda Function
                            
                                Converting DynamoDB JSON to Standard JSON with Java
                            
                                DynamoDB schema updates with AWS Amplify

Connecting DynamoDB from Spark program to load all items from one table using Python?

Tags:

apache-spark-sql

amazon-dynamodb

pyspark

sms_1190

People also ask

1 Answers

Alexander Patrikalakis

Recent Activity

Donate For Us