How to join tables in AWS DynamoDB?

Tags:

I know the whole design should be based on natural aggregates (documents), however I'm thinking to implement a separate table for localisations (lang, key, text) and then use keys in other tables. However, I was unable to find any example on doing this.

Any pointers might be helpful!

366

asked Apr 20 '16 19:04

Centurion

5 Answers

You are correct, DynamoDB is not designed as a relational database and does not support join operations. You can think about DynamoDB as just being a set of key-value pairs.

You can have the same keys across multiple tables (e.g. document_IDs), but DynamoDB doesn't automatically sync them or have any foreign-key features. The document_IDs in one table, while named the same, are technically a different set than the ones in a different table. It's up to your application software to make sure that those keys are synced.

DynamoDB is a different way of thinking about databases and you might want to consider using a managed relational database such as Amazon Aurora: https://aws.amazon.com/rds/aurora/

One thing to note, Amazon EMR does allow DynamoDB tables to be joined, but I'm not sure that's what you're looking for: http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/EMRforDynamoDB.html

answered Oct 08 '22 04:10

Reid Hughes

With DynamoDB, rather than join I think the best solution is to store the data in the shape you later intend to read it.

If you find yourself requiring complex read queries you might have fallen into the trap of expecting DynamoDB to behave like an RDBMS, which it is not. Transform and shape the data you write, keep the read simple.

Disk is far cheaper than compute these days - don't be afraid to denormalise.

answered Oct 08 '22 03:10

Lloyd

Update: This answer is well within the defined community guidelines and not a non-answer speaking only about a commercial solution.

One solution I have seen come up multiple times in this space is to sync from DynamoDB into a separate database that is more well suited for the types of operations you're looking for.

I wrote a blog about this topic comparing various approaches I've seen people take to this very problem, but I'll summarize some of the key takeaways here so you don't have to read all of it.

DynamoDB secondary indexes

What's good?

Fast and no other systems needed!
Good for a very specific analytic feature you're building (like a leaderboard)

Considerations

Limited # of secondary indexes, limited fidelity of queries
Expensive if you're depending on scans
Security and performance concerns using production database directly for analytics

DynamoDB + Glue + S3 + Athena

Architecture

What's good?

All components are “serverless” and require no provisioning of infrastructure
Easy to automate ETL pipeline

Considerations

High end-to-end data latency of several hours, which means stale data
Query latency varies between tens of seconds to minutes
Schema enforcement can lose information with mixed types
ETL process can require maintenance from time to time if structure of data in source changes

DynamoDB + Hive/Spark

Architecture

What's good?

Queries over latest data in DynamoDB
Requires no ETL/pre-processing other than specifying a schema

Considerations

Schema enforcement can lose information when fields have mixed types
EMR cluster requires some administration and infrastructure management
Queries over the latest data involves scans and are expensive
Query latency varies between tens of seconds to minutes directly on Hive/Spark
Security and performance implications of running analytical queries on an operational database

DynamoDB + AWS Lambda + Elasticsearch

What's good?

Full-text search support
Support for several types of analytical queries
Can work over the latest data in DynamoDB

Considerations

Requires management and monitoring of infrastructure for ingesting, indexing, replication, and sharding
Requires separate system to ensure data integrity and consistency between DynamoDB and Elasticsearch
Scaling is manual and requires provisioning additional infrastructure and operations
No support for joins between different indexes

DynamoDB + Rockset

Architecture

What's good?

Completely serverless. No operations or provisioning of infrastructure or database required
Live sync between DynamoDB and the Rockset collection, so that they are never more than a few seconds apart
Monitoring to ensure consistency between DynamoDB and Rockset
Automatic indexes built over the data enabling low-latency queries
SQL query serving that can scale to high QPS
Joins with data from other sources such as Amazon Kinesis, Apache Kafka, Amazon S3, etc.
Integrations with tools like Tableau, Redash, Superset, and SQL API over REST and using client libraries.
Features including full-text search, ingest transformations, retention, encryption, and fine-grained access control

Considerations

Not a great fit for storing rarely queried data (like machine logs)
Not a transactional datastore

(Full Disclosure: I work on the product team @ Rockset) Check out the blog for more details on the individual approaches.

answered Oct 08 '22 04:10

2 revs

You must query the first table, then iterate through each item with a get request on the next table.

The other answers are unsatisfactory as 1) don't answer the question and, more importantly, 2) how can you design your tables in advance to knowing their future application? The technical debt is just too high to reasonably cover unbounded future possibilities.

My answer horribly inefficient but this is the only current solution to the posed question.

I eagerly await a better answer.

answered Oct 08 '22 02:10

James Shiztar

I know that my response is slightly late, by a couple of years. However, I was able to dig up some additional information, regarding Amazon DynamoDB & Joins, which might benefit you (or perhaps another individual, who may stumble upon this discussion, while researching this information, in the future).

To get to the point, I was able to locate some documentation on the Amazon DynamoDB Website, which states that the Apache HiveQL Query Language can be utilized, to perform Joins on Amazon DynamoDB Tables, Columns & Data, etc.

Querying Data in DynamoDB (w/ HiveQL): https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/EMRforDynamoDB.Querying.html

Working w/ Amazon DynamoDB & Apache Hive: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/EMRforDynamoDB.Tutorial.html

Processing Amazon DynamoDB Data with Apache Hive on Amazon EMR: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/EMRforDynamoDB.html

I hope this information helps someone out, if not the original poster.

answered Oct 08 '22 03:10

Matti

Related questions
                            
                                How to register EC2 Instance to ECS cluster?
                            
                                aws ecr saying "Cannot perform an interactive login from a non TTY device" after copied cmd from "Amazon Container Services"
                            
                                EC2 instance on Amazon and I am greeted with "No space left on the disk"
                            
                                Can't Delete Empty S3 Bucket
                            
                                Are Boto3 Resources and Clients Equivalent? When Use One or Other?
                            
                                How does one find prices from Amazon's site programmatically? [closed]
                            
                                How can I automatically start a node.js application in Amazon Linux AMI on aws?
                            
                                No RegionEndpoint or ServiceURL configured
                            
                                CloudFormation doesn't deploy to API gateway stages on update
                            
                                Can I update an existing Amazon S3 object?
                            
                                AWS SDK Error - Signature not yet current
                            
                                How to access HTTP headers for request to AWS API Gateway using Lambda?
                            
                                AWS S3: How to check if a file exists in a bucket using bash
                            
                                how to find size of database, schema, table in redshift
                            
                                AWS API Gateway error: API Gateway does not have permission to assume the provided role as S3 proxy
                            
                                AWS: Why does my RDS instance keep starting after I turned it off?
                            
                                Copy docker image from one AWS ECR repo to another
                            
                                What is exactly "Assume" a role in AWS?
                            
                                When Lambda is invoked by SNS, will there always be just 1 record?
                            
                                What is the difference between commands and container_commands configuration keys in Beanstalk?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to join tables in AWS DynamoDB?

Tags:

amazon-web-services

amazon-dynamodb

amazon

Centurion

People also ask

5 Answers

Reid Hughes

Lloyd

DynamoDB secondary indexes

What's good?

Considerations

DynamoDB + Glue + S3 + Athena

What's good?

Considerations

DynamoDB + Hive/Spark

What's good?

Considerations

DynamoDB + AWS Lambda + Elasticsearch

What's good?

Considerations

DynamoDB + Rockset

What's good?

Considerations

2 revs

James Shiztar

Matti

Recent Activity

Donate For Us