I have a table in MySQL Containing 500 million records. I want to import this table to Amazon DynamoDB.I understand there are two ways to do it:
JAVA Api: The problem with this approach is that it is slow, also the connection to database gets dropped sometimes.
Amazon Data Import Pipeline : Seems promising, But how do I export the data from MySQL to the format recognized by DynamoDB?
Please let me the best possible approach between the two.
AWS has two services that can help you to perform that operation.
Data Pipeline
A very simple way - if your "schemas" are similar (I always feel awkward to talk about schema for DynamoDB) - would be to export from MySQL to S3, then to import from S3 to DynamoDB.
Data Pipeline has two Tutorials to help you to setup thee tasks
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-copydata-mysql.html http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-importexport-ddb-part1.html
You can further improve this process by developing a single pipeline that perform the import and the export. Should you need to transform the data between the import and export, you will need to develop your transformation code and execute it from the pipeline.
In Data Pipeline terms, this is call an Activity. An activity might be as simple as a shell script or as complex as a Hive / Hadoop / Pig application running on an EMR closer. http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-concepts-activities.html
Data Pipeline will also let you schedule your execution at regular time interval.
Hive and EMR
Hive is an hadoop tool to write SQL commands to manipulate data sources. Hive translate the SQL in an Hadoop application which is run on a cluster. You can run Hive on AWS Elastic Map Reduce Cluster (a managed service hadoop cluster).
Hive on EMR can connect to non relation data sources, such as files on S3 or DynamoDB database. It allows you to write SQL statements on top of DynamoDB !
In your use case, you need to write an Hive script that would read from MySQL and write to DynamoDB. You can transform the data using standard (Hive) SQL expressions.
More about Hive on EMR : http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive.html
More about DynamoDB and Hive : http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/EMRforDynamoDB.Walkthrough.html http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/EMRforDynamoDB.html
I found the easiest way for me was to write a script to transfer all information into a json file in the format specified here : AWS Load Data
{
"ProductCatalog": [
{
"PutRequest": {
"Item": {
"Id": {
"N": "101"
},
"Title": {
"S": "Book 101 Title"
},
"ISBN": {
"S": "111-1111111111"
},
"Authors": {
"L": [
{
"S": "Author1"
}
]
},
"Price": {
"N": "2"
},
"Dimensions": {
"S": "8.5 x 11.0 x 0.5"
},
"PageCount": {
"N": "500"
},
"InPublication": {
"BOOL": true
},
"ProductCategory": {
"S": "Book"
}
}
}
},
{
"PutRequest": {
"Item": {
"Id": {
"N": "103"
},
"Title": {
"S": "Book 103 Title"
},
"ISBN": {
"S": "333-3333333333"
},
"Authors": {
"L": [
{
"S": "Author1"
},
{
"S": "Author2"
}
]
},
"Price": {
"N": "2000"
},
"Dimensions": {
"S": "8.5 x 11.0 x 1.5"
},
"PageCount": {
"N": "600"
},
"InPublication": {
"BOOL": false
},
"ProductCategory": {
"S": "Book"
}
}
}
},
{
"PutRequest": {
"Item": {
"Id": {
"N": "205"
},
"Title": {
"S": "18-Bike-204"
},
"Description": {
"S": "205 Description"
},
"BicycleType": {
"S": "Hybrid"
},
"Brand": {
"S": "Brand-Company C"
},
"Price": {
"N": "500"
},
"Color": {
"L": [
{
"S": "Red"
},
{
"S": "Black"
}
]
},
"ProductCategory": {
"S": "Bicycle"
}
}
}
}
]
}
and then create the tables and run the code from my console
aws dynamodb batch-write-item --request-items file://ProductCatalog.json
To download and configure aws cli :https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Tools.CLI.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With