Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is Spark SQL UDAF (user defined aggregate function) available in the Python API?

As of Spark 1.5.0 it seems possible to write your own UDAF's for custom aggregations on DataFrames: Spark 1.5 DataFrame API Highlights: Date/Time/String Handling, Time Intervals, and UDAFs

It is however unclear to me if this functionality is supported in the Python API?

like image 321
kentt Avatar asked Nov 03 '15 15:11

kentt


1 Answers

You cannot defined Python UDAF in Spark 1.5.0-2.0.0. There is a JIRA tracking this feature request:

  • https://issues.apache.org/jira/browse/SPARK-10915

resolved with goal "later" so it probably won't happen anytime soon.

You can use Scala UDAF from PySpark - it is described Spark: How to map Python with Scala or Java User Defined Functions?

like image 118
2 revsuser6022341 Avatar answered Nov 16 '22 02:11

2 revsuser6022341