Azure Databricks vs ADLA for processing

Tags:

Presently, I have all my data files in Azure Data Lake Store. I need to process these files which are mostly in csv format. The processing would be running jobs on these files to extract various information for e.g.Data for certain periods of dates or certain events related to a scenario or adding data from multiple tables/files. These jobs run everyday through u-sql jobs in data factory(v1 or v2) and then sent to powerBI for visualization.

Using ADLA for all this processing, I feel it takes a lot of time to process and seems very expensive. I got a suggestion that I should use Azure Databricks for the above processes. Could somebody help me with this direction in the difference between the two and if it would be helpful to shift? Can I modify all my U-sql jobs into the Databricks notebook format?

208

asked Sep 14 '18 19:09

Jobi

1 Answers

Disclaimer: I work for Databricks.

It is tough to give pros/cons or advice without knowing how much data you work with, what kind of data it is, or how long your processing times are. If you want to compare Azure's Data Lake Analytics costs to Databricks, it can only be accurately done through speaking with a member of the sales team.

Keep in mind that ADLA is based on YARN cluster manager(from Hadoop) and only runs U-SQL batch processing workloads. A description from blue granite:

ADLA is focused on batch processing, which is great for many Big Data workloads. 
Some example uses for ADLA include, but are not limited to:

- Prepping large amounts of data for insertion into a Data Warehouse
- Processing scraped web data for science and analysis
- Churning through text, and quickly tokenizing to enable context and sentiment analysis
- Using image processing intelligence to quickly process unstructured image data
- Replacing long-running monthly batch processing with shorter running distributed processes

Databricks covers both batch and stream processing, and handles both ETL (data engineer) and Data science (Machine Learning, Deep Learning) workloads. Generally, here is why companies use Databricks.

Faster, reliable, and better scaling Apache Spark™. Databricks created a customized version of Apache Spark™ (Databricks Runtime) that has optimizations allowing for as high as 100x faster processing than vanilla Apache Spark™.
Removes infrastructure bottlenecks that result from setup time or cost. Databricks creates Apache Spark™ clusters with all the necessary componenents in a few minutes. Apache Spark™, Python, Scala, plus all the Machine Learning and Deep Learning libraries you need are setup without involving Ops/DevOps. Clusters can autoscale to only use extra resources when needed, and unused clusters will auto-terminate after a set time to avoid incurring unnecessary costs.
Unified analytics platform for both Data engineers and Data scientists. Data engineers and data science teams are working completely independently. There are miscommunications, lack of visibility into each other's code and work, and inefficiencies in the development pipeline (getting data ingested, cleaned, and ready for analysis). Databricks provides collaborative notebooks that support multiple languages (SQL, R, Python, Scala, etc.) so that these two groups can work together
Remove complexities from streaming use cases. Databricks has a new product called Delta that allows you to keep the scale of a data lake, without running into the reliability, performance, and data inconsistency issues that often occur with processing large amounts of streaming schema-less data while others are trying to read from it. Delta provides performance boosts on top of the Apache Spark™ runtime, and allows for things like upserts on data in the data lake (typically extremely difficult to do).
Enterprise security, support, plus spark expertise. Encryption, access controls, and more with 3rd party validated security. 75% of the Apache Spark™ codebase is contributed to by Databricks', so level of knowledge and expertise that be provided is better than you would get anywhere else. That expertise could be assistance in optimizing queries, tuning your clusters, recommending how to setup your data pipelines etc.

There's more reasons than those, but those are some of the most common. You should try out a trial on the website if you think it may help your situation.

195

answered Dec 08 '22 00:12

GuavaKhan

Related questions
                            
                                I am using Scripts.Render() on ASP.NET MVC - what can the reason be that 6 hours after the update users still get the old (bundled) script file?
                            
                                Can I assign a reserved IP to Azure Container Instances (ACI)?
                            
                                What does IsEncrypted in local.appsettings.json of Azure function mean?
                            
                                How to modify Azure Devops email notification template?
                            
                                SignalR - Change server timeout response
                            
                                How Would I know my Azure Table Storage AccountName?
                            
                                Windows authentication in asp.net mvc 3 hosted on Windows Azure?
                            
                                Azure: Worker role looping through "recycling"
                            
                                Protection API Exceptions when scaling Azure Web Roles using ACS
                            
                                Testing a Windows Azure web app for maximum user load
                            
                                Publish Entity-Framework Code-First Migrations with no Context in the startup project
                            
                                SQL Azure - How can I select sysdatabases table from master database in SQL Azure?
                            
                                Woff file mime type and Azure
                            
                                Set the server farm (aka web hosting plan) when running New-AzureWebsite
                            
                                How does Azure PowerShell work with username/password based auth?
                            
                                AzureAD PowerShell New-AzureRmRoleAssignment keeps failing
                            
                                Setting up a custom domain with an Azure Function app
                            
                                Auto scale up/down Cosmos DB RU's
                            
                                Azure Storage Explorer - Not showing "failed" queue items?
                            
                                Azure Cosmos DB: HTTP 400 in Application Insights

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Azure Databricks vs ADLA for processing

Tags:

azure

databricks

azure-data-lake

u-sql

Jobi

People also ask

1 Answers

GuavaKhan

Recent Activity

Donate For Us