Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Copying data from MySQL to Amazon DynamoDB

I have a table in MySQL Containing 500 million records. I want to import this table to Amazon DynamoDB.I understand there are two ways to do it:

  1. JAVA Api: The problem with this approach is that it is slow, also the connection to database gets dropped sometimes.

  2. Amazon Data Import Pipeline : Seems promising, But how do I export the data from MySQL to the format recognized by DynamoDB?

Please let me the best possible approach between the two.

like image 466
user2730428 Avatar asked Jan 23 '15 06:01

user2730428


2 Answers

AWS has two services that can help you to perform that operation.

  • Data Pipeline
  • EMR cluster with Hive

Data Pipeline

A very simple way - if your "schemas" are similar (I always feel awkward to talk about schema for DynamoDB) - would be to export from MySQL to S3, then to import from S3 to DynamoDB.

Data Pipeline has two Tutorials to help you to setup thee tasks

http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-copydata-mysql.html http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-importexport-ddb-part1.html

You can further improve this process by developing a single pipeline that perform the import and the export. Should you need to transform the data between the import and export, you will need to develop your transformation code and execute it from the pipeline.

In Data Pipeline terms, this is call an Activity. An activity might be as simple as a shell script or as complex as a Hive / Hadoop / Pig application running on an EMR closer. http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-concepts-activities.html

Data Pipeline will also let you schedule your execution at regular time interval.

Hive and EMR

Hive is an hadoop tool to write SQL commands to manipulate data sources. Hive translate the SQL in an Hadoop application which is run on a cluster. You can run Hive on AWS Elastic Map Reduce Cluster (a managed service hadoop cluster).

Hive on EMR can connect to non relation data sources, such as files on S3 or DynamoDB database. It allows you to write SQL statements on top of DynamoDB !

In your use case, you need to write an Hive script that would read from MySQL and write to DynamoDB. You can transform the data using standard (Hive) SQL expressions.

More about Hive on EMR : http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive.html

More about DynamoDB and Hive : http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/EMRforDynamoDB.Walkthrough.html http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/EMRforDynamoDB.html

like image 88
Sébastien Stormacq Avatar answered Oct 18 '22 06:10

Sébastien Stormacq


I found the easiest way for me was to write a script to transfer all information into a json file in the format specified here : AWS Load Data

{
    "ProductCatalog": [
        {
            "PutRequest": {
                "Item": {
                    "Id": {
                        "N": "101"
                    },
                    "Title": {
                        "S": "Book 101 Title"
                    },
                    "ISBN": {
                        "S": "111-1111111111"
                    },
                    "Authors": {
                        "L": [
                            {
                                "S": "Author1"
                            }
                        ]
                    },
                    "Price": {
                        "N": "2"
                    },
                    "Dimensions": {
                        "S": "8.5 x 11.0 x 0.5"
                    },
                    "PageCount": {
                        "N": "500"
                    },
                    "InPublication": {
                        "BOOL": true
                    },
                    "ProductCategory": {
                        "S": "Book"
                    }
                }
            }
        },
        {
            "PutRequest": {
                "Item": {
                    "Id": {
                        "N": "103"
                    },
                    "Title": {
                        "S": "Book 103 Title"
                    },
                    "ISBN": {
                        "S": "333-3333333333"
                    },
                    "Authors": {
                        "L": [
                            {
                                "S": "Author1"
                            },
                            {
                                "S": "Author2"
                            }
                        ]
                    },
                    "Price": {
                        "N": "2000"
                    },
                    "Dimensions": {
                        "S": "8.5 x 11.0 x 1.5"
                    },
                    "PageCount": {
                        "N": "600"
                    },
                    "InPublication": {
                        "BOOL": false
                    },
                    "ProductCategory": {
                        "S": "Book"
                    }
                }
            }
        },
        {
            "PutRequest": {
                "Item": {
                    "Id": {
                        "N": "205"
                    },
                    "Title": {
                        "S": "18-Bike-204"
                    },
                    "Description": {
                        "S": "205 Description"
                    },
                    "BicycleType": {
                        "S": "Hybrid"
                    },
                    "Brand": {
                        "S": "Brand-Company C"
                    },
                    "Price": {
                        "N": "500"
                    },
                    "Color": {
                        "L": [
                            {
                                "S": "Red"
                            },
                            {
                                "S": "Black"
                            }
                        ]
                    },
                    "ProductCategory": {
                        "S": "Bicycle"
                    }
                }
            }
        }
    ]
}

and then create the tables and run the code from my console

aws dynamodb batch-write-item --request-items file://ProductCatalog.json

To download and configure aws cli :https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Tools.CLI.html

like image 36
chickens Avatar answered Oct 18 '22 04:10

chickens