Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Modify nested property inside Struct column with PySpark

I want to modify/filter on a property inside a struct. Let's say I have a dataframe with the following column :

#+------------------------------------------+
#|                 arrayCol                 |
#+------------------------------------------+
#| {"a" : "some_value", "b" : [1, 2, 3]}    |
#+------------------------------------------+

Schema:

struct<a:string, b:array<int>>

I want to filter out some values in 'b' property when value inside the array == 1

The result desired is the following :

#+------------------------------------------+
#|                 arrayCol                 |
#+------------------------------------------+
#| {"a" : "some_value", "b" : [2, 3]}       |
#+------------------------------------------+

Is it possible to do it without extracting the property, filter the values, and re-build another struct ?

like image 420
gael Avatar asked Mar 17 '26 04:03

gael


1 Answers

Update:

For spark 3.1+, withField can be used to update the struct column without having to recreate all the struct. In your case, you can update the field b using filter function to filter the array values like this:

import pyspark.sql.functions as F

df1 = df.withColumn(
    'arrayCol',
    F.col('arrayCol').withField('b', F.filter(F.col("arrayCol.b"), lambda x: x != 1))
)

df1.show()
#+--------------------+
#|            arrayCol|
#+--------------------+
#|{some_value, [2, 3]}|
#+--------------------+

For older versions, Spark doesn’t support adding/updating fields in nested structures. To update a struct column, you'll need to create a new struct using the existing fields and the updated ones:

import pyspark.sql.functions as F

df1 = df.withColumn(
    "arrayCol",
    F.struct(
        F.col("arrayCol.a").alias("a"),
        F.expr("filter(arrayCol.b, x -> x != 1)").alias("b")
    )
)
like image 176
blackbishop Avatar answered Mar 20 '26 09:03

blackbishop



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!