NameError: name 'SparkSession' is not defined

Question

I'm new to cask cdap and Hadoop environment.

I'm creating a pipeline and I want to use a PySpark Program. I have all the script of the spark program and it works when I test it by command like, insted it doesn't if I try to copy- paste it in a cdap pipeline.

It gives me an error in the logs:

NameError: name 'SparkSession' is not defined

My script starts in this way:

from pyspark.sql import *

spark = SparkSession.builder.getOrCreate()
from pyspark.sql.functions import trim, to_date, year, month
sc= SparkContext()

How can I fix it?

dol · Accepted Answer

You forgot to add

import pyspark
from pyspark.sql import SparkSession
# ---Your code----

NameError: name 'SparkSession' is not defined

Tags:

apache-spark

pyspark

Matteo Perico

1 Answers

dol

Recent Activity

Donate For Us

NameError: name 'SparkSession' is not defined

Tags:

apache-spark

pyspark

Matteo Perico

1 Answers

dol

Related questions

Recent Activity

Donate For Us