Azure data lake analytics and azure databricks both can be used for batch processing. Could anyone please help me understand when to choose one over another?
From our simple example, we identified that Data Lake Analytics is more efficient when performing transformations and load operations by using runtime processing and distributed operations. On the other hand, Databricks has rich visibility using a step by step process that leads to more accurate transformations.
Managed Delta Lake in Azure Databricks provides a layer of reliability that enables you to curate, analyze and derive value from your data lake on the cloud.
The last and most significant difference between the two tools is that ADF is generally used for data movement, ETL process, and data orchestration whereas; Databricks helps in data streaming and data collaboration in real-time.
In my humble opinion, a lot of it comes down to existing skillsets. If you have a team experienced in Spark, Java, Python, r or Scala then Databricks is a natural fit. If on the other hand you have a team with existing SQL and c# skills, then the learning curve for them with U-SQL will be less steep.
That aside, there are other questions which can drive out differences:
UPDATE October 2018: As far as I am aware, U-SQL does not currently support ADLS Gen 2, which would count against it (happy to be corrected). I will update the post if and when that support is added.
UPDATE January 2019: U-SQL has not had any meaningful updates since Spring 2018.
HTH
Databricks has more language options that allows professional with different skills to work on the data. Also with databricks you can run jobs with high-performance, in-memory clusters.
In a project, we use data lake more as a storage, and do all the jobs (ETL, analytics) via databricks notebook. Storing data in data lake is cheaper $.
Back to your questions, if a complex batch job, and different type of professional will work on the data you. You may choose a Azure Data Lake + Databricks architecture. Otherwise an Azure Data Lake would satisfied your needs.
Take a look of these 2 articles would help. https://databricks.com/glossary/data-lake https://visualbi.com/blogs/microsoft/azure/etl-azure-databricks-vs-data-lake-analytics/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With