Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pyspark 'PipelinedRDD' object has no attribute 'show'

I I want to find out what all the items in df which are not in df1 , also items in df1 but not in df

    df =sc.parallelize([1,2,3,4 ,5 ,6,7,8,9])
    df1=sc.parallelize([4 ,5 ,6,7,8,9,10])
    df2 = df.subtract(df1)
    df2.show()
    df3 = df1.subtract(df)
    df3.show()

Just want to check the result to see if I understand the function well. But got this error 'PipelinedRDD' object has no attribute 'show' any suggestion?

like image 698
newleaf Avatar asked Dec 15 '16 00:12

newleaf


2 Answers

print(df2.take(10))

df.show() is only for spark DataFrame

like image 70
Zhang Tong Avatar answered Nov 18 '22 20:11

Zhang Tong


Convert an rdd to a spark dataframe with createDataFrame

like image 41
robinovitch61 Avatar answered Nov 18 '22 20:11

robinovitch61