I am using Jupyter notebook and just started to learn Apache spark, but getting an error while creating a simple RDD:
sc.parallelize([2, 3, 4]).count()
the error is : parallelize() missing 1 required positional argument: 'c'
This happens for every kind like if I try textFile("")
, I get that a positional argument is missing. I have the sparkcontext as sc
, can someone please help me with this.
You have to Initializing a SparkContext.
Here is a sample code from Learning Spark: Lightning-Fast Big Data Analysis
from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("local").setAppName("My App")
sc = SparkContext(conf = conf)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With