I I want to find out what all the items in df which are not in df1 , also items in df1 but not in df
    df =sc.parallelize([1,2,3,4 ,5 ,6,7,8,9])
    df1=sc.parallelize([4 ,5 ,6,7,8,9,10])
    df2 = df.subtract(df1)
    df2.show()
    df3 = df1.subtract(df)
    df3.show()
Just want to check the result to see if I understand the function well. But got this error 'PipelinedRDD' object has no attribute 'show' any suggestion?
print(df2.take(10))
df.show() is only for spark DataFrame
Convert an rdd to a spark dataframe with createDataFrame
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With