I am trying to download all the documents in my cosmosDB collection to a local directory. I want to modify a few things in all of the JSON documents using python, then upload them to another Azure account. What is the simplest, fastest way to download all of the documents in my collection? Should I use the CosmosDB emulator? I've been told to check out Azure's data factory? Would that help with downloading files locally? I've also been referred to CosmosDB's data migration tool and I saw that it facilitates import data to CosmosDB but I can't find much on exporting. I have about 6GB of Json files in my collection.
Thanks.
There are a few methods to export data from Cosmos DB. The quickest one is to use Document DB / Cosmos DB Migration Tool. This is a tool provided by Microsoft to migrate data TO/FROM various sources such as MongoDB, JSON, csv and SQL Server to Cosmos DB.
Sign in to Azure portal. From All resources, find and navigate to your Azure Cosmos DB account, select Keys, and copy the Primary Connection String. Go to https://cosmos.azure.com/, paste the connection string and select Connect.
Using the Azure Cosmos DB Emulator, you can develop and test your application locally, without creating an Azure subscription or incurring any costs. When you're satisfied with how your application is working in the Azure Cosmos DB Emulator, you can switch to using an Azure Cosmos DB account in the cloud.
In Cosmos DB Container is just like a collection of documents. Container is a single logical resource composed of multiple physical partitions. It is just like a Template and has Partition Key and Throughput.
In the past I've used the DocumentDb (CosmosDb) Data Migration Tool which is available for download from Microsoft.
When running the app you need to specify source and target as in the screenshot below
Make sure that you choose to Import from DocumentDb and specify the connection string and collection you want to export from. If you want to dump the entire contents of your collection the query would just be
SELECT * FROM c
Then under the Target Information you can choose a JSON file which will be saved to your local hard drive. You're free to modify the contents of that file in any way and then use it as Source Information later when you're ready to import it back to another collection.
I used the migration tool and found that it is great if you have a reasonably sized db as it does use processing and bandwidth for a considerable period. I had to chunk a 10GB db and that took too long so ended up using Data Lake Analytics to transfer via script to SQL server and Blob Storage. It gives you a lot of flexibility to transform the data and store either in Data Lake of other distributed systems. As well if needed it helps if you are using cosmos for staging and need to run the data through any cleaning algorithms.
The other advantages are that you can set up batching and you get a lot of processing stats to determine how to optimize large data transformations. Hope this helps. Cheers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With