Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Complete scan of dynamoDb with boto3

My table is around 220mb with 250k records within it. I'm trying to pull all of this data into python. I realize this needs to be a chunked batch process and looped through, but I'm not sure how I can set the batches to start where the previous left off.

Is there some way to filter my scan? From what I read that filtering occurs after loading and the loading stops at 1mb so I wouldn't actually be able to scan in new objects.

Any assistance would be appreciated.

import boto3 dynamodb = boto3.resource('dynamodb',     aws_session_token = aws_session_token,     aws_access_key_id = aws_access_key_id,     aws_secret_access_key = aws_secret_access_key,     region_name = region     )  table = dynamodb.Table('widgetsTableName')  data = table.scan() 
like image 547
CJ_Spaz Avatar asked Apr 21 '16 21:04

CJ_Spaz


People also ask

How do you query DynamoDB with Boto3?

Query Tables in DynamoDB using Boto3 To query items in DynamoDB, you can use the query() method to fetch items based on primary key values. In addition, you can use the KeyConditionExpression to specify the value of the partition key and return all items from the table with that partition key.

How does Boto3 connect to DynamoDB?

Connecting AWS Python SDK (Boto3) with DynamoDBInstall the latest version of Boto3 by running the command below. This will install the Boto3 Python dependency, which is required for our code to run. Now we will connect with our local instance of DynamoDB using Python. We will use the code below to do so.

How can I speed up DynamoDB Scan?

Scans are generally speaking slow. To make that process faster, you can use a feature called "Parallel Scans" which divide the whole DynamoDB Table into Segments. A separate thread/worker then processes each Segment so N workers can work simultaneously to go through the whole keyspace faster.


1 Answers

I think the Amazon DynamoDB documentation regarding table scanning answers your question.

In short, you'll need to check for LastEvaluatedKey in the response. Here is an example using your code:

import boto3 dynamodb = boto3.resource('dynamodb',                           aws_session_token=aws_session_token,                           aws_access_key_id=aws_access_key_id,                           aws_secret_access_key=aws_secret_access_key,                           region_name=region )  table = dynamodb.Table('widgetsTableName')  response = table.scan() data = response['Items']  while 'LastEvaluatedKey' in response:     response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])     data.extend(response['Items']) 
like image 182
Tay B Avatar answered Sep 17 '22 10:09

Tay B