What is the correct way to install the delta module in python??
In the example they import the module
from delta.tables import *
but i did not find the correct way to install the module in my virtual env
Currently i am using this spark param -
"spark.jars.packages": "io.delta:delta-core_2.11:0.5.0"
As the correct answer is hidden in the comments of the accepted solution, I thought I'd add it here.
You need to create your spark context with some extra settings and then you can import delta:
spark_session = SparkSession.builder \
.master("local") \
.config("spark.jars.packages", "io.delta:delta-core_2.12:0.8.0") \
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \
.getOrCreate()
from delta.tables import *
Annoyingly, your IDE will of course shout at you about this as the package isn't installed and you will also be operating without autocomplete and type hints. I'm sure there's a work around and I will update if I come accross it.
The package itself is on their github here and the readme suggests you can pip install but that doesn't work. In theory you could clone it and install manually.
Because Delta's Python codes are stored inside a jar and loaded by Spark, delta
module cannot be imported until SparkSession/SparkContext is created.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With