Delta Lake without Databricks Runtime

Tags:

Can one use Delta Lake and not being dependent on Databricks Runtime? (I mean, is it possible to use delta-lake with hdfs and spark on prem only?) If no, could you elaborate why is that so from technical point of view?

202

asked Mar 23 '20 16:03

user3207359

3 Answers

Yes, delta lake has been open sourced by databricks (https://delta.io/). I am using deltalake(0.6.1) along with apache spark(2.4.5) & S3. Many other integrations are also available to accommodate existing tech stack e.g. integration of hive, presto, athena etc. Connectors:https://github.com/delta-io/connectors Integrations: https://docs.delta.io/latest/presto-integration.html & https://docs.delta.io/latest/integrations.html

answered Oct 17 '22 06:10

Swapnil Chougule

According to this https://vimeo.com/338100834, it is possible to use Delta Lake without Databricks Runtime. Delta Lake is just a lib which "knows" how to write and read transactionally into the table (a collection of parquet files) by maintaining a special transaction log besides each table. Of course, a special connector for external applications (e.g. hive) is needed in order to work with such tables. Otherwise, transactional and consistency guarantees cannot be enforced.

answered Oct 17 '22 05:10

user3207359

According to documentation: https://docs.delta.io/latest/quick-start.html#set-up-apache-spark-with-delta-lake, delta lake has been open-sourced to use with Apache Spark. The integration can be done easily by adding delta lake jar to the code or adding the library to the spark installation path. Hive integration can be done using: https://github.com/delta-io/connectors.

answered Oct 17 '22 05:10

jainnidhi

Related questions
                            
                                How to convert RDD[Row] to RDD[String]
                            
                                What is the faster way to count the number of entries in a data frame?
                            
                                apache-spark startup error on alpine linux docker
                            
                                Spark Scala Dataframe convert a column of Array of Struct to a column of Map
                            
                                Dummy Encoding using Pyspark [duplicate]
                            
                                How to create a Dataset of Maps?
                            
                                Spark Structured Streaming with Hbase integration
                            
                                How does Spark 2.0 handle column nullability?
                            
                                Spark: Extracting summary for a ML logistic regression model from a pipeline model
                            
                                Pyspark, Add a character in the middle of a string
                            
                                How to implement Functor[Dataset]
                            
                                Understanding Kryo serialization buffer overflow error
                            
                                Using UDF ignores condition in when
                            
                                Spark: select with key in map
                            
                                How to bucketize a group of columns in pyspark?
                            
                                ERROR : User did not initialize spark context
                            
                                Why does Spark's Word2Vec return a vector?
                            
                                Set spark configuration
                            
                                PySpark explode stringified array of dictionaries into rows
                            
                                Convert UTC timestamp to local time based on time zone in PySpark

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Delta Lake without Databricks Runtime

Tags:

apache-spark

hdfs

databricks

delta-lake

user3207359

People also ask

3 Answers

Swapnil Chougule

user3207359

jainnidhi

Recent Activity

Donate For Us