My table is around 220mb with 250k records within it. I'm trying to pull all of this data into python. I realize this needs to be a chunked batch process and looped through, but I'm not sure how I can set the batches to start where the previous left off.
Is there some way to filter my scan? From what I read that filtering occurs after loading and the loading stops at 1mb so I wouldn't actually be able to scan in new objects.
Any assistance would be appreciated.
import boto3 dynamodb = boto3.resource('dynamodb', aws_session_token = aws_session_token, aws_access_key_id = aws_access_key_id, aws_secret_access_key = aws_secret_access_key, region_name = region ) table = dynamodb.Table('widgetsTableName') data = table.scan()
Query Tables in DynamoDB using Boto3 To query items in DynamoDB, you can use the query() method to fetch items based on primary key values. In addition, you can use the KeyConditionExpression to specify the value of the partition key and return all items from the table with that partition key.
Connecting AWS Python SDK (Boto3) with DynamoDBInstall the latest version of Boto3 by running the command below. This will install the Boto3 Python dependency, which is required for our code to run. Now we will connect with our local instance of DynamoDB using Python. We will use the code below to do so.
Scans are generally speaking slow. To make that process faster, you can use a feature called "Parallel Scans" which divide the whole DynamoDB Table into Segments. A separate thread/worker then processes each Segment so N workers can work simultaneously to go through the whole keyspace faster.
I think the Amazon DynamoDB documentation regarding table scanning answers your question.
In short, you'll need to check for LastEvaluatedKey
in the response. Here is an example using your code:
import boto3 dynamodb = boto3.resource('dynamodb', aws_session_token=aws_session_token, aws_access_key_id=aws_access_key_id, aws_secret_access_key=aws_secret_access_key, region_name=region ) table = dynamodb.Table('widgetsTableName') response = table.scan() data = response['Items'] while 'LastEvaluatedKey' in response: response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey']) data.extend(response['Items'])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With