I was wondering what's the best practice moving a documentDB to the Azure Data Lake Storage. Should I create a file for each document in a collection or move the entire documentDB? Also I didn't find much information on how I can access the documentDB using U-SQL?
Input would be appreciated.
Azure Cosmos DB holds 3 to 4 months of the most recent data used by the web applications. Data Lake Storage holds historical data used by the web applications. Periodically, Azure Data Factory moves data from Azure Cosmos DB to Azure Data Lake to reduce storage costs.
Azure Blob Storage is a general purpose, scalable object store that is designed for a wide variety of storage scenarios. Azure Data Lake Storage Gen1 is a hyper-scale repository that is optimized for big data analytics workloads. Based on shared secrets - Account Access Keys and Shared Access Signature Keys.
You currently cannot use U-SQL to access data in DocumentDB (or now called CosmosDB). There is a feature request here. Please feel free to add your vote.
If you move the data over, the organization depends on how you want to manage the data (delete all, or only parts?), how it is structured (keep similar structured data together, either in same file or same folder) and how you use it (always need all of it? or only parts?) and what gives you the best performance accessing it (larger files are normally better, but if they are JSON, also make sure the extraction process works).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With