Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Querying nested attributes in Amazon DynamoDB

How can I efficiently query on nested attributes in Amazon DynamoDB?

I have a document structure as below, which lets me store related information in the document itself (rather than referencing it).

It makes sense to store the seminars nested in the course, since they will likely be queried alongside the course (they are all course-specific, i.e. a course has many seminars, and a seminar belongs to a course).

In CouchDB, which I’m migrating from, I could write a View that would project some nested attributes for querying. I understand that I can’t project anything that isn’t a top-level attribute into a dynamodb secondary index, so this approach doesn’t seem to work.

This brings me back to the question: how can I efficiently query on nested attributes without scanning, if I can’t use them as keys in an index?

For example, if I want to get average attendance at Nelson Mandela Theatre, how can I query for the values of registrations and attendees in all seminars that have a location of “Nelson Mandela Theatre” without resorting to a scan?

{
    “course_id”: “ABC-1234567”,
    “course_name”: “Statistics 101”,
    “tutors”: [“Cognito-sub-1”, “Cognito-sub-2”],
    “seminars”: [ 
        {
            “seminar_id”: “XXXYYY-12345”,
            “epoch_time”: “123456789”,
            “duration”: “5400”,
            “location”: “Nelson Mandela Theatre”,
            “name”: “How to lie with statistics”,
            “registrations”: “92”,                
            “attendees”: “61”
        },
        {
            “seminar_id”: “BBBCCC-44444”,
            “epoch_time”: “155555555”,
            “duration”: “5400”,
            “location”: “Nelson Mandela Theatre”,
            “name”: “Statistical significance for dog owners”,
            “registrations”: “244”,
            “attendees”: “240”
        },
        {
            “seminar_id”: “XXXAAA-54321”,
            “epoch_time”: “223456789”,
            “duration”: “4000”,
            “location”: “Starbucks”,
            “name”: “Is feral cat population growth a leading indicator for the S&P 500?”,
            “registrations”: “40”                
        }
    ]
}

{
    “course_id”: “CJX-5553389”,
    “course_name”: “Cat Health 101”,
    “tutors”: [“Cognito-sub-4”, “Cognito-sub-9”],
    “seminars”: [ 
        {
            “seminar_id”: “TTRHJK-43278”,
            “epoch_time”: “123456789”,
            “duration”: “5400”,
            “location”: “Catwoman Hall”,
            “name”: “Emotional support octopi for cats”,
            “registrations”: “88”, 
            “attendees”: “87”
        },
        {
            “seminar_id”: “BBBCCC-44444”,
            “epoch_time”: “123666789”,
            “duration”: “5400”,
            “location”: “Nelson Mandela Theatre”,
            “name”: “Statistical significance for cat owners”,
            “registrations”: “44”,
            “attendees”: “44”
        }
    ]
}
like image 308
Harry Avatar asked May 24 '18 11:05

Harry


2 Answers

Index cannot be created for nested attributes (i.e. document data types in Dynamodb).

Document Types – A document type can represent a complex structure with nested attributes—such as you would find in a JSON document. The document types are list and map.

Query Api:-

A query operation searches only primary key attribute values and supports a subset of comparison operators on key attribute values to refine the search process.

Scan API:-

A scan operation scans the entire table. You can specify filters to apply to the results to refine the values returned to you, after the complete scan.

In order to use Query API, the hash key value is required. The OP doesn't have any information that hash key value is available. As per OP, the data needs to be queried by location attribute which is inside the Dynamodb List data type. Now, the option is to look at GSI.

Kindly read more about the GSI. One of the rules is that GSI can be created using top level attributes only. So, the location can't be used to create the index.

So, creating the GSI in order to use Query API has been ruled out as well.

The index key attributes can consist of any top-level String, Number, or Binary attributes from the base table; other scalar types, document types, and set types are not allowed.

Because of the above mentioned reasons, the Query API can't be used to get the data based on location attribute assuming hash key value is not available.

If hash key value is available, FilterExpression can be used to filter the data. Only way to filter the data present in the complex list data type is CONTAINS function. In order to use CONTAINS function, all the attributes in the occurrence is required to match the data (i.e. seminar_id, location, duration and all other attributes). So, it is definitely not possible to fulfil the use case mentioned in the OP using the current data model.

Proposed alternate solution:-

Re-modeling the data structure as mentioned below could be an option to resolve the problem. There is definitely no other solution available to fulfil the use case using Query API.

Main Table :-

Course Id - Hash Key

seminar_id - Sort Key

GSI :-

Seminar location - Hash Key

Course Id - Sort Key

In a DynamoDB table, each key value must be unique. However, the key values in a global secondary index do not need to be unique.

Now, you can use the Query API on GSI to get the data for Seminar location is equal to Nelson Mandela Theatre. You can use the course id in the query api if you know the value. The query api will potentially give multiple items in the result set. You can use FilterExpression if you would like to further filter the data based on some non key attributes.

like image 157
notionquest Avatar answered Nov 18 '22 19:11

notionquest


This is an example from here where you use a filter expression, it is with a scan operation, but maybe you can apply something similar for query instead of scan (take a look at the API):

{
    "TableName": "MyTable",
    "FilterExpression": "#k_Compatible.#k_RAM = :v_Compatible_RAM",
    "ExpressionAttributeNames": {
        "#k_Compatible": "Compatible",
        "#k_RAM": "RAM"
    },
    "ExpressionAttributeValues": {
        ":v_Compatible_RAM": "RAM1"
    }
}
like image 31
CCarlos Avatar answered Nov 18 '22 19:11

CCarlos