Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Moving a DocumentDB Collection to Azure Data Lake Storage

I was wondering what's the best practice moving a documentDB to the Azure Data Lake Storage. Should I create a file for each document in a collection or move the entire documentDB? Also I didn't find much information on how I can access the documentDB using U-SQL?

Input would be appreciated.

like image 601
reachify Avatar asked May 04 '17 20:05

reachify


People also ask

Is Cosmos DB a data lake?

Azure Cosmos DB holds 3 to 4 months of the most recent data used by the web applications. Data Lake Storage holds historical data used by the web applications. Periodically, Azure Data Factory moves data from Azure Cosmos DB to Azure Data Lake to reduce storage costs.

Is Azure Blob Storage the same as data lake?

Azure Blob Storage is a general purpose, scalable object store that is designed for a wide variety of storage scenarios. Azure Data Lake Storage Gen1 is a hyper-scale repository that is optimized for big data analytics workloads. Based on shared secrets - Account Access Keys and Shared Access Signature Keys.


1 Answers

You currently cannot use U-SQL to access data in DocumentDB (or now called CosmosDB). There is a feature request here. Please feel free to add your vote.

If you move the data over, the organization depends on how you want to manage the data (delete all, or only parts?), how it is structured (keep similar structured data together, either in same file or same folder) and how you use it (always need all of it? or only parts?) and what gives you the best performance accessing it (larger files are normally better, but if they are JSON, also make sure the extraction process works).

like image 90
Michael Rys Avatar answered Sep 19 '22 01:09

Michael Rys