How to derive Percentile using Spark Data frame and GroupBy in python

Question

I have a Spark dataframe which has Date, Group and Price columns.

I'm trying to derive the percentile(0.6) for the Price column of that dataframe in Python. Besides, I need to add the output as a new column.

I tried the code below:

perudf = udf(lambda x: x.quantile(.6))
df1 = df.withColumn("Percentile", df.groupBy("group").agg("group"),perudf('price'))

but it is throwing the following error:

assert all(isinstance(c, Column) for c in exprs), "all exprs should be Column"
AssertionError: all exprs should be Column

user3343061 · Accepted Answer

You can use "percentile_approx" using sql. It is difficult to create UDF in pyspark.

Refer to this link for other details: https://mail-archives.apache.org/mod_mbox/spark-user/201510.mbox/%3CCALte62wQV68D6J87EVq6AD5-T3D0F3fHjuzs+1C5aCHOUUQS8w@mail.gmail.com%3E

How to derive Percentile using Spark Data frame and GroupBy in python

Tags:

python-2.7

apache-spark

pyspark

pyspark-sql

Somashekar Muniyappa

1 Answers

user3343061

Recent Activity

Donate For Us

How to derive Percentile using Spark Data frame and GroupBy in python

Tags:

python-2.7

apache-spark

pyspark

pyspark-sql

Somashekar Muniyappa

1 Answers

user3343061

Related questions

Recent Activity

Donate For Us