Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why DynamoDB scan with Limit and FilterExpression not return the items that match the filter requirements?

I need make a scan with limit and a condition on DynamoDB.

The docs says:

In a response, DynamoDB returns all the matching results within the scope of the Limit value. For example, if you issue a Query or a Scan request with a Limit value of 6 and without a filter expression, DynamoDB returns the first six items in the table that match the specified key conditions in the request (or just the first six items in the case of a Scan with no filter). If you also supply a FilterExpression value, DynamoDB will return the items in the first six that also match the filter requirements (the number of results returned will be less than or equal to 6).


The code (NODEJS):

var params = {
    ExpressionAttributeNames: {"#user": "User"},
    ExpressionAttributeValues: {":user": parseInt(user.id)},
    FilterExpression: "#user = :user and attribute_not_exists(Removed)",
    Limit: 2,
    TableName: "XXXX"
};

DynamoDB.scan(params, function(err, data) {
    if (err) {
        dataToSend.message = "Unable to query. Error: " + err.message;
    } else if (data.Items.length == 0) {
        dataToSend.message = "No results were found.";
    } else {
        dataToSend.data = data.Items;
        console.log(dataToSend);
    }
});



Table XXXX definitions:

  • Primary partition key: User (Number)
  • Primary sort key: Identifier (String)
  • INDEX:
    • Index Name: RemovedIndex
    • Type: GSI
    • Partition key: Removed (Number)
    • Sort key: -
    • Attributes: ALL


In code above, if I remove the Limit parameter, DynamoDB will return the items that match the filter requirements. So, the conditions are ok. But when I scan with Limit parameter, the result is empty.

The XXXX table, has 5 items. Only the 2 firsts have the Removed attribute. When I scan without Limit parameter, DynamoDB returns the 3 items without Removed attribute.

What i'm doing wrong?

like image 865
Gabriel Cunha Avatar asked Aug 04 '16 22:08

Gabriel Cunha


People also ask

Does DynamoDB Scan return all items?

A Scan operation in Amazon DynamoDB reads every item in a table or a secondary index. By default, a Scan operation returns all of the data attributes for every item in the table or index. You can use the ProjectionExpression parameter so that Scan only returns some of the attributes, rather than all of them.

What does DynamoDB Scan return?

The Scan operation returns one or more items and item attributes by accessing every item in a table or a secondary index. To have DynamoDB return fewer items, you can provide a FilterExpression operation.

How does DynamoDB limit work?

DynamoDB supports up to five requests per second for each of these APIs. The result set from a Query is limited to 1 MB per call. You can use the LastEvaluatedKey from the query response to retrieve more results. The result set from a Scan is limited to 1 MB per call.


3 Answers

From the docs that you quoted:

If you also supply a FilterExpression value, DynamoDB will return the items in the first six that also match the filter requirements

By combining Limit and FilterExpression you have told DynamoDB to only look at the first two items in the table, and evaluate the FilterExpression against those items. Limit in DynamoDB can be confusing because it works differently from limit in a SQL expression in a RDBMS.

like image 173
Mark B Avatar answered Oct 23 '22 10:10

Mark B


Also ran into this issue, i guess you will just have to scan the whole table to a max of 1 MB

Scan The result set from a Scan is limited to 1 MB per call. You can use the LastEvaluatedKey from the scan response to retrieve more results.

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html

like image 20
Samuel Okpapi Avatar answered Oct 23 '22 12:10

Samuel Okpapi


You might be able to get what you need by using a secondary index. Using the classic RDB example, customer - order example: you have one table for customers and one for orders. The Orders table has a Key consisting of Customer - HASH, Order - RANGE. So if you wanted to get the latest 10 orders, there would be no way to do it without a scan

But if you create a Global Secondary Index on orders of "Some Constant" -- HASH, Date RANGE, and queried against that index, they query would do what you want and only charge you for the RCUs involved with the records returned. No expensive scan needed. Note, writes will be more expensive, but in most cases, there are many more reads than writes.

Now you have your original problem if you want to get the 10 biggest orders for a day larger than $1000. The query would return the last 10 orders, and then filter out those less than $1000.

In this case, you could create a computed key of Date-OrderAmount, and queries against that index would return what you want.

It's not as simple as SQL, but you need to think about access patterns in SQL too. If if you have a lot of data, you need to create Indexes in SQL or the DB will happily to table scans on your behalf, which will impair performance and raise your costs.

Note that everything I proposed is normalized in the sense that there is only one source of truth. You are not duplicating data -- you are merely recasting views of it to get what you need from DynamoDB.

Bear in mind that the CONSTANT as a HASH s subject to the 10GB per partition limit, so you would need to design around it if you had a lot of active data. For example, depending on your expected access pattern, you could use Customer and not a constant as a HASH. Or use STreams to organize the data (or subsets) in other ways.

like image 43
Andy Brand Avatar answered Oct 23 '22 10:10

Andy Brand