What is the correct way to install the delta module in python?

In the example they import the module

from delta.tables import *

but i did not find the correct way to install the module in my virtual env

Currently i am using this spark param -

"spark.jars.packages": "io.delta:delta-core_2.11:0.5.0"

2 Answers

As the correct answer is hidden in the comments of the accepted solution, I thought I'd add it here.

You need to create your spark context with some extra settings and then you can import delta:

spark_session = SparkSession.builder \
    .master("local") \
    .config("spark.jars.packages", "io.delta:delta-core_2.12:0.8.0") \
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \

from delta.tables import *

Annoyingly, your IDE will of course shout at you about this as the package isn't installed and you will also be operating without autocomplete and type hints. I'm sure there's a work around and I will update if I come accross it.

The package itself is on their github here and the readme suggests you can pip install but that doesn't work. In theory you could clone it and install manually.

Because Delta's Python codes are stored inside a jar and loaded by Spark, delta module cannot be imported until SparkSession/SparkContext is created.

