Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Azure Data Lake Gen 1 vs Gen 2

Recently Azure announced Data Lake Gen 2 preview. As far as I know the main difference between Gen 1 and Gen 2 (in terms of functionality) is the Object Store and File System access over the same data at the same time. Other differences would be the price, available location etc. Can anyone explain what are the other key differences between Gen 1 and Gen 2?

like image 387
Shehan Weerasooriya Avatar asked Aug 10 '18 08:08

Shehan Weerasooriya


2 Answers

Basically, think of gen2 as a superset of gen1 plus all of the best parts of blob storage: tiers, HDFS and object store API's and presumably the ability to efficiently handle the management of over 35K files and efficiently dealing with many small sizes and more trickle write type operations.. plus its cheaper.

I'm trying to get some clarity on a few specifics but not finding much in the meantime try these links:

https://azure.microsoft.com/en-us/blog/a-closer-look-at-azure-data-lake-storage-gen2/

https://docs.microsoft.com/en-us/azure/storage/data-lake-storage/introduction

like image 106
Jason Horner Avatar answered Nov 11 '22 21:11

Jason Horner


Azure data lake storage Gen2 is a super set of Azure data lake Gen 1. It also called as a "no-compromise data lake" by Microsoft. Gen 2 extends Azure blob storage capabilities and it is best optimized for analytics workloads. It can store data once and access via existing blob storage and HDFS-compliant file system interfaces with no programming changes or data copying when doing database operations since it supports atomic file and folder operations.
At present, it is only available in West US 2 and West Central US data centers. But it will be expanded into other data centers in the near future according to Microsoft.

like image 28
Shehan Weerasooriya Avatar answered Nov 11 '22 20:11

Shehan Weerasooriya