Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dataframe object is not callable in pyspark

temp = Window.partitionBy("id").orderBy("time").rowsBetween(-5, 5)
spark_df.withColumn("movingAvg",fn.avgspark_df("average")).over(temp)).show()

I'm getting this error in the last line .

dataframe object is not callable

like image 455
xinlin li Avatar asked Oct 17 '25 22:10

xinlin li


1 Answers

You are missing a bracket, but it also seems some of the syntax is wrong. I assume this is what your code was before the bracket got missing:

fn.avgspark_df("average")

Which is why you get the error; you are trying to call the DataFrame as a function. I believe you can achieve what you want with:

import pyspark.sql.functions as fn
from pyspark.sql import Window

df = pd.DataFrame({'id': [0,0,0,0,0,1,1,1,1,1],
                   'time': [1,2,3,4,5,1,2,3,4,5],
                   'average':[0,1,2,3,4,5,6,7,8,9] })
df = sqlContext.createDataFrame(df)

temp = Window.partitionBy("id").orderBy("time").rowsBetween(-1, 1)
df.withColumn("movingAvg",fn.avg("average").over(temp)).show()
like image 96
Florian Avatar answered Oct 20 '25 11:10

Florian