Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWS DynamoDB and MapReduce in Java

I have a huge DynamoDB table that I want to analyze to aggregate data that is stored in its attributes. The aggregated data should then be processed by a Java application. While I understand the really basic concepts behind MapReduce, I've never used it before.

In my case, let's say that I have a customerId and orderNumbers attribute in every DynamoDB item, and that I can have more than one item for the same customer. Like:

customerId: 1, orderNumbers: 2
customerId: 1, orderNumbers: 6
customerId: 2, orderNumbers: -1

Basically I want to sum the orderNumbers for each customerId, and then execute some operations in Java with the aggregate.

AWS Elastic MapReduce could probably help me, but I don't understand how do I connect a custom JAR with DynamoDB. My custom JAR probably needs to expose both a map and reduce functions, where can I find the right interface to implement?

Plus I'm a bit confused by the docs, it seems like I should first export my data to S3 before running my custom JAR. Is this correct?

Thanks

like image 955
Mark Avatar asked Apr 08 '12 23:04

Mark


People also ask

Does DynamoDB support map?

SS (string set) type, NS (number set) type, or BS (binary set) type. The DynamoDBTypeConverter interface lets you map your own arbitrary data types to a data type that is natively supported by DynamoDB. For more information, see Mapping arbitrary data.

Is DynamoDB OLTP or OLAP?

DynamoDB is designed for OLTP use cases — high speed, high velocity data access where you're operating on a few records at a time. But users also have a need for OLAP access patterns — big, analytical queries over the entire dataset to find popular items, or number of orders by day, or other insights.

Does DynamoDB support JDBC?

The DynamoDB JDBC Driver is a powerful tool that allows you to easily connect-to live Amazon DynamoDB data through any JDBC capable application or tool! With the Driver users can access Amazon DynamoDB the same way that they would connect to any other JDBC data source.


1 Answers

Note: I haven't built a working EMR, just read about it.

First of all, Prerequisites for Integrating Amazon EMR with Amazon DynamoDB

You can work directly on DynamoDB: Hive Command Examples for Exporting, Importing, and Querying Data in Amazon DynamoDB, As you can see you can do "SQL-like" queries that way.

If you have zero knowledge about Hadoop you should probably read some introduction material such as: What is Hadoop

This tutorial is another good read Using Amazon Elastic MapReduce with DynamoDB

Regarding your custom JAR application, you need to upload it to S3. Use this guide: How to Create a Job Flow Using a Custom JAR

I hope this will help you get started.

like image 81
Chen Harel Avatar answered Sep 29 '22 02:09

Chen Harel