Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Palantir Foundry, how should I get the current SparkSession in a Transform?

I'm writing a Python Transform and need to get the SparkSession so I can construct a DataFrame.

How should I do this?

like image 431
hjones Avatar asked Nov 14 '25 17:11

hjones


1 Answers

You can pass the SparkContext as an argument in the transform, which can then be used to generate the SparkSession.

@transform(
    output=Output('/path/to/first/output/dataset'),
)
def my_compute_function(ctx, output):
    # type: (TransformContext, TransformOutput) -> None

    # In this example, the Spark session is used to create an empty data frame.
    columns = [
        StructField("col_a", StringType(), True)
    ]
    empty_df = ctx.spark_session.createDataFrame([], schema=StructType(columns))

    output.write_dataframe(empty_df)

This example can also be found in the Foundry documentation here: https://www.palantir.com/docs/foundry/transforms-python/transforms-python-api/#transform

like image 78
tomwhittaker Avatar answered Nov 17 '25 09:11

tomwhittaker



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!