I can't seem to get this right. I want to do a scan of a table and only return records where a particular field does not exist.
I've tried the following two things:
HashMap<String, Condition> scanFilter = new HashMap();
Condition scanFilterCondition = new Condition().withComparisonOperator(ComparisonOperator.NULL.toString());
scanFilter.put("field", scanFilterCondition);
ScanRequest scan = new ScanRequest()
.withTableName("table name")
.withScanFilter(scanFilter)
etc
and
ScanRequest scan = new ScanRequest()
.withTableName("table")
.withFilterExpression("attribute_not_exists(attributeName)")
.withLimit(100)
etc
However they return no records (and most records are missing this field). Note, that if I remove the filter the scan does return and process all records as expected so the basic query is correct. How do I do this?
EDIT added full method in case it helps// Get information on the table so that we can set the read capacity for the operation.
List<String> tables = client.listTables().getTableNames();
String tableName = tables.stream().filter(table -> table.equals(configuration.getTableName())).findFirst().get();
if(Strings.isNullOrEmpty(tableName))
return 0;
TableDescription table = client.describeTable(tableName).getTable();
//Set the rate limit to a third of the provisioned read capacity.
int rateLimit = (int) (table.getProvisionedThroughput().getReadCapacityUnits() / 3);
RateLimiter rateLimiter = RateLimiter.create(rateLimit);
// Track how much throughput we consume on each page
int permitsToConsume = 1;
// Initialize the pagination token
Map<String, AttributeValue> exclusiveStartKey = null;
int count = 1;
int writtenCount = 0;
do {
// Let the rate limiter wait until our desired throughput "recharges"
rateLimiter.acquire(permitsToConsume);
//We only want to process records that don't have the field key set.
HashMap<String, Condition> scanFilter = new HashMap<>();
Condition scanFilterCondition = new Condition().withComparisonOperator(ComparisonOperator.NULL.toString());
scanFilter.put("field", scanFilterCondition);
ScanRequest scan = new ScanRequest()
.withTableName(configuration.getNotificationsTableName())
.withScanFilter(scanFilter)
.withLimit(100)
.withReturnConsumedCapacity(ReturnConsumedCapacity.TOTAL)
.withExclusiveStartKey(exclusiveStartKey);
ScanResult result = client.scan(scan);
exclusiveStartKey = result.getLastEvaluatedKey();
// Account for the rest of the throughput we consumed,
// now that we know how much that scan request cost
double consumedCapacity = result.getConsumedCapacity().getCapacityUnits();
permitsToConsume = (int)(consumedCapacity - 1.0);
if(permitsToConsume <= 0) {
permitsToConsume = 1;
}
// Process results here
} while (exclusiveStartKey != null);
The NULL condition seems to be fine. You need to do recursive search using Scan. The Dynamodb scan doesn't scan the whole database in one go. It scans the data based on the amount of consumed provisioned throughput.
Sample code to perform scan in loop based on LastEvaluatedKey:-
ScanResult result = null;
do {
HashMap<String, Condition> scanFilter = new HashMap<>();
Condition scanFilterCondition = new Condition().withComparisonOperator(ComparisonOperator.NULL);
scanFilter.put("title", scanFilterCondition);
ScanRequest scanRequest = new ScanRequest().withTableName(tableName).withScanFilter(scanFilter);
if (result != null) {
scanRequest.setExclusiveStartKey(result.getLastEvaluatedKey());
}
result = dynamoDBClient.scan(scanRequest);
LOGGER.info("Number of records ==============>" + result.getItems().size());
for (Map<String, AttributeValue> item : result.getItems()) {
LOGGER.info("Movies ==================>" + item.get("title"));
}
} while (result.getLastEvaluatedKey() != null);
NULL : The attribute does not exist. NULL is supported for all data types, including lists and maps. Note This operator tests for the nonexistence of an attribute, not its data type. If the data type of attribute "a" is null, and you evaluate it using NULL, the result is a Boolean false. This is because the attribute "a" exists; its data type is not relevant to the NULL comparison operator.
LastEvaluatedKey The primary key of the item where the operation stopped, inclusive of the previous result set. Use this value to start a new operation, excluding this value in the new request.
If LastEvaluatedKey is empty, then the "last page" of results has been processed and there is no more data to be retrieved.
If LastEvaluatedKey is not empty, it does not necessarily mean that there is more data in the result set. The only way to know when you have reached the end of the result set is when LastEvaluatedKey is empty.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With