Why DynamoDB scan with Limit and FilterExpression not return the items that match the filter requirements?

Tags:

I need make a scan with limit and a condition on DynamoDB.

The docs says:

In a response, DynamoDB returns all the matching results within the scope of the Limit value. For example, if you issue a Query or a Scan request with a Limit value of 6 and without a filter expression, DynamoDB returns the first six items in the table that match the specified key conditions in the request (or just the first six items in the case of a Scan with no filter). If you also supply a FilterExpression value, DynamoDB will return the items in the first six that also match the filter requirements (the number of results returned will be less than or equal to 6).

The code (NODEJS):

var params = {
    ExpressionAttributeNames: {"#user": "User"},
    ExpressionAttributeValues: {":user": parseInt(user.id)},
    FilterExpression: "#user = :user and attribute_not_exists(Removed)",
    Limit: 2,
    TableName: "XXXX"
};

DynamoDB.scan(params, function(err, data) {
    if (err) {
        dataToSend.message = "Unable to query. Error: " + err.message;
    } else if (data.Items.length == 0) {
        dataToSend.message = "No results were found.";
    } else {
        dataToSend.data = data.Items;
        console.log(dataToSend);
    }
});

Table XXXX definitions:

Primary partition key: User (Number)
Primary sort key: Identifier (String)
INDEX:
- Index Name: RemovedIndex
- Type: GSI
- Partition key: Removed (Number)
- Sort key: -
- Attributes: ALL

In code above, if I remove the Limit parameter, DynamoDB will return the items that match the filter requirements. So, the conditions are ok. But when I scan with Limit parameter, the result is empty.

The XXXX table, has 5 items. Only the 2 firsts have the Removed attribute. When I scan without Limit parameter, DynamoDB returns the 3 items without Removed attribute.

What i'm doing wrong?

865

asked Aug 04 '16 22:08

Gabriel Cunha

3 Answers

From the docs that you quoted:

If you also supply a FilterExpression value, DynamoDB will return the items in the first six that also match the filter requirements

By combining Limit and FilterExpression you have told DynamoDB to only look at the first two items in the table, and evaluate the FilterExpression against those items. Limit in DynamoDB can be confusing because it works differently from limit in a SQL expression in a RDBMS.

173

answered Oct 23 '22 10:10

Mark B

Also ran into this issue, i guess you will just have to scan the whole table to a max of 1 MB

Scan The result set from a Scan is limited to 1 MB per call. You can use the LastEvaluatedKey from the scan response to retrieve more results.

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html

answered Oct 23 '22 12:10

Samuel Okpapi

You might be able to get what you need by using a secondary index. Using the classic RDB example, customer - order example: you have one table for customers and one for orders. The Orders table has a Key consisting of Customer - HASH, Order - RANGE. So if you wanted to get the latest 10 orders, there would be no way to do it without a scan

But if you create a Global Secondary Index on orders of "Some Constant" -- HASH, Date RANGE, and queried against that index, they query would do what you want and only charge you for the RCUs involved with the records returned. No expensive scan needed. Note, writes will be more expensive, but in most cases, there are many more reads than writes.

Now you have your original problem if you want to get the 10 biggest orders for a day larger than $1000. The query would return the last 10 orders, and then filter out those less than $1000.

In this case, you could create a computed key of Date-OrderAmount, and queries against that index would return what you want.

It's not as simple as SQL, but you need to think about access patterns in SQL too. If if you have a lot of data, you need to create Indexes in SQL or the DB will happily to table scans on your behalf, which will impair performance and raise your costs.

Note that everything I proposed is normalized in the sense that there is only one source of truth. You are not duplicating data -- you are merely recasting views of it to get what you need from DynamoDB.

Bear in mind that the CONSTANT as a HASH s subject to the 10GB per partition limit, so you would need to design around it if you had a lot of active data. For example, depending on your expected access pattern, you could use Customer and not a constant as a HASH. Or use STreams to organize the data (or subsets) in other ways.

answered Oct 23 '22 10:10

Andy Brand

Related questions
                            
                                setTimeout in Node.js loop
                            
                                Uncaught ReferenceError: angular is not defined - Mean.IO
                            
                                Sequelize with NodeJS can't join tables with limit
                            
                                %j specifier in console.log excludes some properties
                            
                                How to shim tinymce in webpack?
                            
                                Passing object to node's Error class returns an unaccessible object
                            
                                Node.js - Auto Refresh In Dev
                            
                                Sorting on elastic search with node js
                            
                                how to create cross platform scripts (multiple command for single line) in package.json (nodeJs)
                            
                                fetch() cannot set cookies received from the server?
                            
                                Complexity of accessing data in an object
                            
                                CodeStar/CodeBuild Role X trusts too many services, expected only 1
                            
                                npm ci outputs errors with angular 8 and node 12 on Windows: node-gyp rebuild
                            
                                This expression is not callable. Type 'Number' has no call signatures
                            
                                Node.js: Client-Side Templating v/s Server-Side Templating
                            
                                How do I release port 80 on a beaglebone so I can use it?
                            
                                Mongoose Schema with nested optional object with required fields
                            
                                Is it possible for child processes in Node.js to preserve colored output?
                            
                                How to open browser from Visual Studio Code API
                            
                                ESlint Error Install from Command Line

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why DynamoDB scan with Limit and FilterExpression not return the items that match the filter requirements?

Tags:

node.js

amazon-web-services

amazon-dynamodb