Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to bulk insert into Amazon Aurora RDS directly from Amazon S3 tab delimited files in C#?

I am currently using Amazon Redshift to store aggregated data from the 50 - 100 GB (ie. millions of rows) of tab delimited files that are pushed to a bucket in Amazon S3 every day.

Redshift makes this easy by providing a copy command which can be targeted directly to an S3 bucket to bulk load the data.

I would like to use Amazon Aurora RDS for this same purpose. Documentation on Aurora is thin, at best, right now. Is there a way to bulk load directly from S3 into Aurora?

As far as I can tell, MySql's LOAD DATA INFILE requires a path to the file on disk, which I suppose I can work around by downloading the tsv to an AWS instance and running the command from there, though that isn't ideal.

I've also attempted to read the tsv into memory and construct multiple insert statements. This is obviously slow and clunky.

Ideas?

UPDATE 11/2016:

As of Aurora version 1.8 you can now use the following commands to bulk load S3 data:

LOAD DATA FROM S3

or

LOAD XML FROM S3

Aurora Load From S3


UPDATE 7/2018:

Updated AWS Documentation link. Shoutout to Mark Holmes for catching that.


like image 466
Jaaromy Zierse Avatar asked Sep 17 '15 16:09

Jaaromy Zierse


People also ask

Can Aurora read from S3?

Importing data from Amazon S3 isn't supported for Aurora Serverless v1. It is supported for Aurora Serverless v2.

How do I add data to Amazon Aurora?

Loading data into an Amazon Aurora MySQL DB cluster from text files in an Amazon S3 bucket. You can use the LOAD DATA FROM S3 or LOAD XML FROM S3 statement to load data from files stored in an Amazon S3 bucket. If you are using encryption, the Amazon S3 bucket must be encrypted with an AWS managed key.

What are some limitations of using Aurora serverless?

All Aurora Serverless v1 DB clusters have the following limitations: You can't export Aurora Serverless v1 snapshots to Amazon S3 buckets. You can't save data to text files in Amazon S3. You can't use AWS Database Migration Service and Change Data Capture (CDC) with Aurora Serverless v1 DB clusters.


1 Answers

You could use AWS Data Pipeline. There is even a template for loading data from S3 to RDS:

http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-template-copys3tords.html

like image 127
Marcin Pajdzik Avatar answered Oct 14 '22 03:10

Marcin Pajdzik