I have compressed JSON files in S3 and I would like to set up MongoDB in EC2 to server json documents contained in these files. The compressed files are >100M and there are 1000s of them. Each file contains 100000s of small documents.
What is the best way to get this data into Mongo? It would be nicest if there was a way to give Mongo the S3 paths and have it retrieve them itself. I there anything better than downloading the data to the server and doing mongoimport?
Also how well Mongo handle this amount of data?
You don't need to store intermediate files, you can pipe the output of s3 file to stdout and you can get input to mongoimport
from stdin.
Your full command would look something like:
s3cmd get s3://<yourFilename> - | mongoimport -d <dbName> -c <collectionName>
note the -
which says send the file to stdout
rather than to a filename.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With