Pyspark transform method that's equivalent to the Scala Dataset#transform method

Tags:

The Spark Scala API has a Dataset#transform method that makes it easy to chain custom DataFrame transformations like so:

val weirdDf = df
  .transform(myFirstCustomTransformation)
  .transform(anotherCustomTransformation)

I don't see an equivalent transform method for pyspark in the documentation.

Is there a PySpark way to chain custom transformations?

If not, how can the pyspark.sql.DataFrame class be monkey patched to add a transform method?

Update

The transform method was added to PySpark as of PySpark 3.0.

803

asked Sep 15 '17 20:09

Powers

1 Answers

Implementation:

from pyspark.sql.dataframe import DataFrame

def transform(self, f):
    return f(self)

DataFrame.transform = transform

Usage:

spark.range(1).transform(lambda df: df.selectExpr("id * 2"))

160

answered Nov 15 '22 09:11

Alper t. Turker

Related questions
                            
                                How to upload files to new EMR cluster
                            
                                pyspark split a column to multiple columns without pandas
                            
                                spark.storage.memoryFraction setting in Apache Spark
                            
                                spark returns error libsnappyjava.so: failed to map segment from shared object: Operation not permitted
                            
                                How to convert a sparse vector to dense in Scala Spark?
                            
                                Spark looses all executors one minute after starting
                            
                                how to obtain the trained best model from a crossvalidator
                            
                                spark group multiple rdd items by key
                            
                                no valid constructor on spark
                            
                                Many skipped stages for Pregel in Spark UI
                            
                                Can you copy straight from Parquet/S3 to Redshift using Spark SQL/Hive/Presto?
                            
                                What's the performance impact of converting between `DataFrame`, `RDD` and back?
                            
                                Spark submit YARN mode HADOOP_CONF_DIR contents
                            
                                apache spark master ui not working
                            
                                spark "basePath" option setting
                            
                                Access names of fields in struct Spark SQL
                            
                                Spark SQL's Scala API - TimestampType - No Encoder found for org.apache.spark.sql.types.TimestampType
                            
                                Spark dataframe add a row for every existing row
                            
                                Change the Datatype of columns in PySpark dataframe
                            
                                Java & Spark : add unique incremental id to dataset

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pyspark transform method that's equivalent to the Scala Dataset#transform method

Tags:

apache-spark

apache-spark-sql

apache-spark-dataset

pyspark

Powers

People also ask

1 Answers

Alper t. Turker

Recent Activity

Donate For Us