Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Azure CosmosDB - Download all documents in collection to local directory

I am trying to download all the documents in my cosmosDB collection to a local directory. I want to modify a few things in all of the JSON documents using python, then upload them to another Azure account. What is the simplest, fastest way to download all of the documents in my collection? Should I use the CosmosDB emulator? I've been told to check out Azure's data factory? Would that help with downloading files locally? I've also been referred to CosmosDB's data migration tool and I saw that it facilitates import data to CosmosDB but I can't find much on exporting. I have about 6GB of Json files in my collection.

Thanks.

like image 945
Rony Azrak Avatar asked Jul 12 '17 20:07

Rony Azrak


People also ask

How do I export from CosmosDB?

There are a few methods to export data from Cosmos DB. The quickest one is to use Document DB / Cosmos DB Migration Tool. This is a tool provided by Microsoft to migrate data TO/FROM various sources such as MongoDB, JSON, csv and SQL Server to Cosmos DB.

How do I retrieve data from Azure Cosmos DB?

Sign in to Azure portal. From All resources, find and navigate to your Azure Cosmos DB account, select Keys, and copy the Primary Connection String. Go to https://cosmos.azure.com/, paste the connection string and select Connect.

Can I run CosmosDB locally?

Using the Azure Cosmos DB Emulator, you can develop and test your application locally, without creating an Azure subscription or incurring any costs. When you're satisfied with how your application is working in the Azure Cosmos DB Emulator, you can switch to using an Azure Cosmos DB account in the cloud.

Is container and collection same in Cosmos DB?

In Cosmos DB Container is just like a collection of documents. Container is a single logical resource composed of multiple physical partitions. It is just like a Template and has Partition Key and Throughput.


2 Answers

In the past I've used the DocumentDb (CosmosDb) Data Migration Tool which is available for download from Microsoft.

When running the app you need to specify source and target as in the screenshot below

enter image description here

Make sure that you choose to Import from DocumentDb and specify the connection string and collection you want to export from. If you want to dump the entire contents of your collection the query would just be

SELECT * FROM c

Then under the Target Information you can choose a JSON file which will be saved to your local hard drive. You're free to modify the contents of that file in any way and then use it as Source Information later when you're ready to import it back to another collection.

like image 82
Jesse Carter Avatar answered Sep 28 '22 03:09

Jesse Carter


I used the migration tool and found that it is great if you have a reasonably sized db as it does use processing and bandwidth for a considerable period. I had to chunk a 10GB db and that took too long so ended up using Data Lake Analytics to transfer via script to SQL server and Blob Storage. It gives you a lot of flexibility to transform the data and store either in Data Lake of other distributed systems. As well if needed it helps if you are using cosmos for staging and need to run the data through any cleaning algorithms.

The other advantages are that you can set up batching and you get a lot of processing stats to determine how to optimize large data transformations. Hope this helps. Cheers.

like image 31
Peter Molloy Avatar answered Sep 28 '22 01:09

Peter Molloy