Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I am getting an error while creating a simple RDD in Spark

I am using Jupyter notebook and just started to learn Apache spark, but getting an error while creating a simple RDD:

sc.parallelize([2, 3, 4]).count()

the error is : parallelize() missing 1 required positional argument: 'c' This happens for every kind like if I try textFile(""), I get that a positional argument is missing. I have the sparkcontext as sc, can someone please help me with this.

like image 895
Sahil Avatar asked Mar 30 '17 11:03

Sahil


1 Answers

You have to Initializing a SparkContext.

Here is a sample code from Learning Spark: Lightning-Fast Big Data Analysis

from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("local").setAppName("My App")
sc = SparkContext(conf = conf)
like image 65
Haha TTpro Avatar answered Sep 21 '22 18:09

Haha TTpro