Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delta Lake without Databricks Runtime

Can one use Delta Lake and not being dependent on Databricks Runtime? (I mean, is it possible to use delta-lake with hdfs and spark on prem only?) If no, could you elaborate why is that so from technical point of view?

like image 202
user3207359 Avatar asked Mar 23 '20 16:03

user3207359


People also ask

Can I access Delta tables outside of Databricks runtime?

Yes. When you use Delta Lake, you are using open Apache Spark APIs so you can easily port your code to other Spark platforms.

Can Delta Lake work without Spark?

The Delta Standalone Reader (DSR) is a JVM library that allows you to read Delta Lake tables without the need to use Apache Spark; i.e. it can be used by any application that cannot run Spark.

What is the difference between Databricks and Delta Lake?

Databricks refers to Delta Lake as a data lakehouse, a data architecture that offers both storage and analytics capabilities, in contrast to the concepts for data lakes, which store data in native format, and data warehouses, which store structured data (often in SQL format).

Does Databricks use Delta Lake?

Delta Lake on Azure Databricks allows you to configure Delta Lake based on your workload patterns. Azure Databricks adds optimized layouts and indexes to Delta Lake for fast interactive queries.


3 Answers

Yes, delta lake has been open sourced by databricks (https://delta.io/). I am using deltalake(0.6.1) along with apache spark(2.4.5) & S3. Many other integrations are also available to accommodate existing tech stack e.g. integration of hive, presto, athena etc. Connectors:https://github.com/delta-io/connectors Integrations: https://docs.delta.io/latest/presto-integration.html & https://docs.delta.io/latest/integrations.html

like image 56
Swapnil Chougule Avatar answered Oct 17 '22 06:10

Swapnil Chougule


According to this https://vimeo.com/338100834, it is possible to use Delta Lake without Databricks Runtime. Delta Lake is just a lib which "knows" how to write and read transactionally into the table (a collection of parquet files) by maintaining a special transaction log besides each table. Of course, a special connector for external applications (e.g. hive) is needed in order to work with such tables. Otherwise, transactional and consistency guarantees cannot be enforced.

like image 37
user3207359 Avatar answered Oct 17 '22 05:10

user3207359


According to documentation: https://docs.delta.io/latest/quick-start.html#set-up-apache-spark-with-delta-lake, delta lake has been open-sourced to use with Apache Spark. The integration can be done easily by adding delta lake jar to the code or adding the library to the spark installation path. Hive integration can be done using: https://github.com/delta-io/connectors.

like image 1
jainnidhi Avatar answered Oct 17 '22 05:10

jainnidhi