Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dynamodb scan: filter all records where attribute does not exist

I can't seem to get this right. I want to do a scan of a table and only return records where a particular field does not exist.

I've tried the following two things:

HashMap<String, Condition> scanFilter = new HashMap();
Condition scanFilterCondition = new Condition().withComparisonOperator(ComparisonOperator.NULL.toString());
scanFilter.put("field", scanFilterCondition);

ScanRequest scan = new ScanRequest()
    .withTableName("table name")
    .withScanFilter(scanFilter)
etc

and

ScanRequest scan = new ScanRequest()
          .withTableName("table")
          .withFilterExpression("attribute_not_exists(attributeName)")
          .withLimit(100)
          etc

However they return no records (and most records are missing this field). Note, that if I remove the filter the scan does return and process all records as expected so the basic query is correct. How do I do this?

EDIT added full method in case it helps
// Get information on the table so that we can set the read capacity for the operation.
List<String> tables = client.listTables().getTableNames();
String tableName = tables.stream().filter(table -> table.equals(configuration.getTableName())).findFirst().get();
if(Strings.isNullOrEmpty(tableName))
  return 0;
TableDescription table = client.describeTable(tableName).getTable();

//Set the rate limit to a third of the provisioned read capacity.
int rateLimit = (int) (table.getProvisionedThroughput().getReadCapacityUnits() / 3);
RateLimiter rateLimiter = RateLimiter.create(rateLimit);
// Track how much throughput we consume on each page
int permitsToConsume = 1;
// Initialize the pagination token
Map<String, AttributeValue> exclusiveStartKey = null;
int count = 1;
int writtenCount = 0;

do {
  // Let the rate limiter wait until our desired throughput "recharges"
  rateLimiter.acquire(permitsToConsume);

  //We only want to process records that don't have the field key set.
  HashMap<String, Condition> scanFilter = new HashMap<>();
  Condition scanFilterCondition = new Condition().withComparisonOperator(ComparisonOperator.NULL.toString());
  scanFilter.put("field", scanFilterCondition);

  ScanRequest scan = new ScanRequest()
      .withTableName(configuration.getNotificationsTableName())
      .withScanFilter(scanFilter)
      .withLimit(100)
      .withReturnConsumedCapacity(ReturnConsumedCapacity.TOTAL)
      .withExclusiveStartKey(exclusiveStartKey);

  ScanResult result = client.scan(scan);
  exclusiveStartKey = result.getLastEvaluatedKey();

  // Account for the rest of the throughput we consumed,
  // now that we know how much that scan request cost
  double consumedCapacity = result.getConsumedCapacity().getCapacityUnits();
  permitsToConsume = (int)(consumedCapacity - 1.0);
  if(permitsToConsume <= 0) {
    permitsToConsume = 1;
    }

  // Process results here
} while (exclusiveStartKey != null);
like image 847
mark Avatar asked Sep 15 '16 07:09

mark


1 Answers

The NULL condition seems to be fine. You need to do recursive search using Scan. The Dynamodb scan doesn't scan the whole database in one go. It scans the data based on the amount of consumed provisioned throughput.

Sample code to perform scan in loop based on LastEvaluatedKey:-

ScanResult result = null;

            do {
                HashMap<String, Condition> scanFilter = new HashMap<>();
                Condition scanFilterCondition = new Condition().withComparisonOperator(ComparisonOperator.NULL);
                scanFilter.put("title", scanFilterCondition);

                ScanRequest scanRequest = new ScanRequest().withTableName(tableName).withScanFilter(scanFilter);
                if (result != null) {
                    scanRequest.setExclusiveStartKey(result.getLastEvaluatedKey());
                }

                result = dynamoDBClient.scan(scanRequest);

                LOGGER.info("Number of records ==============>" + result.getItems().size());

                for (Map<String, AttributeValue> item : result.getItems()) {
                    LOGGER.info("Movies ==================>" + item.get("title"));
                }
            } while (result.getLastEvaluatedKey() != null);

NULL : The attribute does not exist. NULL is supported for all data types, including lists and maps. Note This operator tests for the nonexistence of an attribute, not its data type. If the data type of attribute "a" is null, and you evaluate it using NULL, the result is a Boolean false. This is because the attribute "a" exists; its data type is not relevant to the NULL comparison operator.

LastEvaluatedKey The primary key of the item where the operation stopped, inclusive of the previous result set. Use this value to start a new operation, excluding this value in the new request.

If LastEvaluatedKey is empty, then the "last page" of results has been processed and there is no more data to be retrieved.

If LastEvaluatedKey is not empty, it does not necessarily mean that there is more data in the result set. The only way to know when you have reached the end of the result set is when LastEvaluatedKey is empty.

like image 161
notionquest Avatar answered Sep 28 '22 02:09

notionquest