As of Spark 1.5.0 it seems possible to write your own UDAF's for custom aggregations on DataFrames: Spark 1.5 DataFrame API Highlights: Date/Time/String Handling, Time Intervals, and UDAFs
It is however unclear to me if this functionality is supported in the Python API?
You cannot defined Python UDAF in Spark 1.5.0-2.0.0. There is a JIRA tracking this feature request:
resolved with goal "later" so it probably won't happen anytime soon.
You can use Scala UDAF from PySpark - it is described Spark: How to map Python with Scala or Java User Defined Functions?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With