Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to efficiently import many large JSON files directly from S3 into MongoDB

I have compressed JSON files in S3 and I would like to set up MongoDB in EC2 to server json documents contained in these files. The compressed files are >100M and there are 1000s of them. Each file contains 100000s of small documents.

What is the best way to get this data into Mongo? It would be nicest if there was a way to give Mongo the S3 paths and have it retrieve them itself. I there anything better than downloading the data to the server and doing mongoimport?

Also how well Mongo handle this amount of data?

like image 238
Daniel Mahler Avatar asked Jun 06 '13 20:06

Daniel Mahler


1 Answers

You don't need to store intermediate files, you can pipe the output of s3 file to stdout and you can get input to mongoimport from stdin.

Your full command would look something like:

s3cmd get s3://<yourFilename> - | mongoimport -d <dbName> -c <collectionName>

note the - which says send the file to stdout rather than to a filename.

like image 124
Asya Kamsky Avatar answered Sep 25 '22 23:09

Asya Kamsky