Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create empty struct in pyspark?

Tags:

pyspark

I'm trying to create empty struct column in pyspark. For array this works

import pyspark.sql.functions as F
df = df.withColumn('newCol', F.array([]))

but this gives me an error.

df = df.withColumn('newCol', F.struct())

I saw similar question but for scala not pyspark so it doesn't really help me.

like image 845
AlienDeg Avatar asked Dec 17 '25 19:12

AlienDeg


2 Answers

Actually the array is not really empty, because it has an empty element. You should instead consider something like this:

df = df.withColumn('newCol', F.lit(None).cast(T.StructType())

PS: it's a late conversion of my comment into an answer, as it has been proposed - I hope it will help even if it's late after the OP's question

like image 89
Christophe Avatar answered Dec 20 '25 16:12

Christophe


If you know the schema of the struct column, you can use the function from_json as follows

    struct_schema = StructType([
       StructField('name', StringType(), False),
       StructField('surname', StringType(), False),
    ])

    df = df.withColumn(
      'newCol', F.from_json(psf.lit(""), struct_schema)
    )
like image 20
Demet Sude Saplık Avatar answered Dec 20 '25 15:12

Demet Sude Saplık



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!