Can i have any books about Azure Data Lake Internals?

Question

I dont wanna use the ADL and ADLA as a black box. I need to understand how the gears rotate underhood to use it in an efficient way.

Where i can find an information that describe internals:

how U-SQL query is processed
how parallelism is worked
how storage is organized in ADL at low level
how DB's storage is organized in ADL at low level (is it rowstore or columnstore)
how partitioning is organized
etc

There is exists a lot of books and whitepappers that describes RDBMS engine's internals. Does it exists for ADL/ADLA?

There are a lot of guys who works in Azure. Could you publish any drafts/whitepappers to use as is (unoficially).

Michael Rys · Accepted Answer

Some of that information is available in presentations we have given. For example you can find some of these presentations on my slideshare account at: http://www.slideshare.net/MichaelRys.

To answer some of your questions above:

The current clustered index version of U-SQL tables are stored in your catalog folder structured as so called structured stream files. These are highly compressible, scaled out files that use a row-oriented structure with self-contained meta data and statistics (more detailed stats can be created). The table construct provides 2 level partitioning: addressable partitions and internal distribution schemes (HASH, RANGE etc). Both help with parallelization, although distribution schemes are more for performance while partition more for data lifecycle management. There is no limit on them, although the sweet spot is 1GB to 4GB per distribution bucket.

1 AU is basically 1 container. And ADLS is NOT HDFS architecturally but offers the WebHDFS API for compatibility.

guyhay_MSFT · Answer

This is a pretty broad question. I assume you've started with the existing documentation on ADLA and U-SQL? https://learn.microsoft.com/en-us/azure/data-lake-analytics/ https://msdn.microsoft.com/library/azure/mt591959

ADLA GA'd in November of 2016, compared to SQL Server in 1987 - that's a very apples and oranges comparison.

Maybe we can start with your specific questions?

Can i have any books about Azure Data Lake Internals?

Tags:

azure-data-lake

u-sql

churupaha

2 Answers

Michael Rys

guyhay_MSFT

Recent Activity

Donate For Us

Can i have any books about Azure Data Lake Internals?

Tags:

azure-data-lake

u-sql

churupaha

2 Answers

Michael Rys

guyhay_MSFT

Related questions

Recent Activity

Donate For Us