I'm new to cask cdap and Hadoop environment.
I'm creating a pipeline and I want to use a PySpark Program. I have all the script of the spark program and it works when I test it by command like, insted it doesn't if I try to copy- paste it in a cdap pipeline.
It gives me an error in the logs:
NameError: name 'SparkSession' is not defined
My script starts in this way:
from pyspark.sql import *
spark = SparkSession.builder.getOrCreate()
from pyspark.sql.functions import trim, to_date, year, month
sc= SparkContext()
How can I fix it?
You forgot to add
import pyspark
from pyspark.sql import SparkSession
# ---Your code----
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With