Can one use Delta Lake and not being dependent on Databricks Runtime? (I mean, is it possible to use delta-lake with hdfs and spark on prem only?) If no, could you elaborate why is that so from technical point of view?
Yes. When you use Delta Lake, you are using open Apache Spark APIs so you can easily port your code to other Spark platforms.
The Delta Standalone Reader (DSR) is a JVM library that allows you to read Delta Lake tables without the need to use Apache Spark; i.e. it can be used by any application that cannot run Spark.
Databricks refers to Delta Lake as a data lakehouse, a data architecture that offers both storage and analytics capabilities, in contrast to the concepts for data lakes, which store data in native format, and data warehouses, which store structured data (often in SQL format).
Delta Lake on Azure Databricks allows you to configure Delta Lake based on your workload patterns. Azure Databricks adds optimized layouts and indexes to Delta Lake for fast interactive queries.
Yes, delta lake has been open sourced by databricks (https://delta.io/). I am using deltalake(0.6.1) along with apache spark(2.4.5) & S3. Many other integrations are also available to accommodate existing tech stack e.g. integration of hive, presto, athena etc. Connectors:https://github.com/delta-io/connectors Integrations: https://docs.delta.io/latest/presto-integration.html & https://docs.delta.io/latest/integrations.html
According to this https://vimeo.com/338100834, it is possible to use Delta Lake without Databricks Runtime. Delta Lake is just a lib which "knows" how to write and read transactionally into the table (a collection of parquet files) by maintaining a special transaction log besides each table. Of course, a special connector for external applications (e.g. hive) is needed in order to work with such tables. Otherwise, transactional and consistency guarantees cannot be enforced.
According to documentation: https://docs.delta.io/latest/quick-start.html#set-up-apache-spark-with-delta-lake, delta lake has been open-sourced to use with Apache Spark. The integration can be done easily by adding delta lake jar to the code or adding the library to the spark installation path. Hive integration can be done using: https://github.com/delta-io/connectors.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With