How to efficiently import many large JSON files directly from S3 into MongoDB

Question

I have compressed JSON files in S3 and I would like to set up MongoDB in EC2 to server json documents contained in these files. The compressed files are >100M and there are 1000s of them. Each file contains 100000s of small documents.

What is the best way to get this data into Mongo? It would be nicest if there was a way to give Mongo the S3 paths and have it retrieve them itself. I there anything better than downloading the data to the server and doing mongoimport?

Also how well Mongo handle this amount of data?

Asya Kamsky · Accepted Answer

You don't need to store intermediate files, you can pipe the output of s3 file to stdout and you can get input to mongoimport from stdin.

Your full command would look something like:

s3cmd get s3://<yourFilename> - | mongoimport -d <dbName> -c <collectionName>

note the - which says send the file to stdout rather than to a filename.

How to efficiently import many large JSON files directly from S3 into MongoDB

Tags:

mongodb

amazon-s3

Daniel Mahler

1 Answers

Asya Kamsky

Recent Activity

Donate For Us

How to efficiently import many large JSON files directly from S3 into MongoDB

Tags:

mongodb

amazon-s3

Daniel Mahler

1 Answers

Asya Kamsky

Related questions

Recent Activity

Donate For Us