I have a huge DynamoDB table that I want to analyze to aggregate data that is stored in its attributes. The aggregated data should then be processed by a Java application. While I understand the really basic concepts behind MapReduce, I've never used it before.
In my case, let's say that I have a customerId
and orderNumbers
attribute in every DynamoDB item, and that I can have more than one item for the same customer. Like:
customerId: 1, orderNumbers: 2
customerId: 1, orderNumbers: 6
customerId: 2, orderNumbers: -1
Basically I want to sum the orderNumbers for each customerId, and then execute some operations in Java with the aggregate.
AWS Elastic MapReduce could probably help me, but I don't understand how do I connect a custom JAR with DynamoDB. My custom JAR probably needs to expose both a map
and reduce
functions, where can I find the right interface to implement?
Plus I'm a bit confused by the docs, it seems like I should first export my data to S3 before running my custom JAR. Is this correct?
Thanks
SS (string set) type, NS (number set) type, or BS (binary set) type. The DynamoDBTypeConverter interface lets you map your own arbitrary data types to a data type that is natively supported by DynamoDB. For more information, see Mapping arbitrary data.
DynamoDB is designed for OLTP use cases — high speed, high velocity data access where you're operating on a few records at a time. But users also have a need for OLAP access patterns — big, analytical queries over the entire dataset to find popular items, or number of orders by day, or other insights.
The DynamoDB JDBC Driver is a powerful tool that allows you to easily connect-to live Amazon DynamoDB data through any JDBC capable application or tool! With the Driver users can access Amazon DynamoDB the same way that they would connect to any other JDBC data source.
Note: I haven't built a working EMR, just read about it.
First of all, Prerequisites for Integrating Amazon EMR with Amazon DynamoDB
You can work directly on DynamoDB: Hive Command Examples for Exporting, Importing, and Querying Data in Amazon DynamoDB, As you can see you can do "SQL-like" queries that way.
If you have zero knowledge about Hadoop you should probably read some introduction material such as: What is Hadoop
This tutorial is another good read Using Amazon Elastic MapReduce with DynamoDB
Regarding your custom JAR application, you need to upload it to S3. Use this guide: How to Create a Job Flow Using a Custom JAR
I hope this will help you get started.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With