I have a Spark dataframe which has Date
, Group
and Price
columns.
I'm trying to derive the percentile(0.6)
for the Price
column of that
dataframe in Python. Besides, I need to add the output as a new column.
I tried the code below:
perudf = udf(lambda x: x.quantile(.6))
df1 = df.withColumn("Percentile", df.groupBy("group").agg("group"),perudf('price'))
but it is throwing the following error:
assert all(isinstance(c, Column) for c in exprs), "all exprs should be Column"
AssertionError: all exprs should be Column
You can use "percentile_approx" using sql. It is difficult to create UDF in pyspark.
Refer to this link for other details: https://mail-archives.apache.org/mod_mbox/spark-user/201510.mbox/%3CCALte62wQV68D6J87EVq6AD5-T3D0F3fHjuzs+1C5aCHOUUQS8w@mail.gmail.com%3E
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With