Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Folder Statistics in Azure Data Lake

I'm trying to summarize how much data has been written to a folder in my Data Lake. What is the best way to do this? Should I use a U-SQL job? HDInsights?

like image 685
BadRaabutation Avatar asked Jan 28 '23 13:01

BadRaabutation


2 Answers

There are two ways to do this:

  1. If it is a one-time operation, you can use Azure Storage Explorer (https://azure.microsoft.com/en-us/features/storage-explorer/), navigate to the Data Lake Store folder and get the size for it.
  2. If you want a programmatic way to do this, Data Lake Store provides a WebHDFS compliant API that can list several folder attributes: GETCONTENTSUMMARY. You can see more details here: https://learn.microsoft.com/en-us/rest/api/datalakestore/webhdfs-filesystem-apis.

Hope this helps

José

like image 95
José Lara_MSFT Avatar answered Jan 31 '23 03:01

José Lara_MSFT


You can use Python code to loop through the files. Refer here: https://cloudarchitected.com/2019/05/computing-total-storage-size-of-a-folder-in-azure-data-lake-storage-gen2/

In case you would like to quickly cross check this:

Download the Azure Storage Explorer from Windows Application https://azure.microsoft.com/en-in/features/storage-explorer/

Open the folder which you would like to view the size details.

On the top bar menu choose More -> Folder Statistics will help you get the details of the Directory including the size in bytes. Refer the attachment [sample snapshot of the Azure Storage Explorer Menu[1]][1]

[1]: https://i.stack.imgur.com/R1DuZ.jpg

like image 25
Sudhu Avatar answered Jan 31 '23 03:01

Sudhu