Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

unbound method createDataFrame()

I am facing error when trying to create a DataFrame from an RDD.
My code:

from pyspark import SparkConf, SparkContext
from pyspark import sql


conf = SparkConf()
conf.setMaster('local')
conf.setAppName('Test')
sc = SparkContext(conf = conf)
print sc.version

rdd = sc.parallelize([(0,1), (0,1), (0,2), (1,2), (1,10), (1,20), (3,18), (3,18), (3,18)])

df = sql.SQLContext.createDataFrame(rdd, ["id", "score"]).collect()

print df

Error:

df = sql.SQLContext.createDataFrame(rdd, ["id", "score"]).collect()
TypeError: unbound method createDataFrame() must be called with SQLContext 
           instance as first argument (got RDD instance instead)

I accomplished the same task in spark shell where a straight forward last three lines of code will print the values. I mainly suspect the import statements because that is where the difference comes between IDE and Shell.

like image 928
Jack Daniel Avatar asked May 03 '26 17:05

Jack Daniel


1 Answers

You need to use an instance of SQLContext. So you could try something like the following:

sqlContext = sql.SQLContext(sc)
df = sqlContext.createDataFrame(rdd, ["id", "score"]).collect()

More details in pyspark documentation.

like image 174
Daniel de Paula Avatar answered May 06 '26 13:05

Daniel de Paula



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!