Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the correct way to install the delta module in python?

What is the correct way to install the delta module in python??

In the example they import the module

from delta.tables import *

but i did not find the correct way to install the module in my virtual env

Currently i am using this spark param -

"spark.jars.packages": "io.delta:delta-core_2.11:0.5.0"

like image 284
ofriman Avatar asked Dec 17 '19 11:12

ofriman


2 Answers

As the correct answer is hidden in the comments of the accepted solution, I thought I'd add it here.

You need to create your spark context with some extra settings and then you can import delta:

spark_session = SparkSession.builder \
    .master("local") \
    .config("spark.jars.packages", "io.delta:delta-core_2.12:0.8.0") \
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \
    .getOrCreate()

from delta.tables import *

Annoyingly, your IDE will of course shout at you about this as the package isn't installed and you will also be operating without autocomplete and type hints. I'm sure there's a work around and I will update if I come accross it.

The package itself is on their github here and the readme suggests you can pip install but that doesn't work. In theory you could clone it and install manually.

like image 146
DataMacGyver Avatar answered Sep 17 '22 17:09

DataMacGyver


Because Delta's Python codes are stored inside a jar and loaded by Spark, delta module cannot be imported until SparkSession/SparkContext is created.

like image 23
zsxwing Avatar answered Sep 17 '22 17:09

zsxwing