My PySpark data frame has the following schema:
schema = spark_df.printSchema()
root
|-- field_1: double (nullable = true)
|-- field_2: double (nullable = true)
|-- field_3 (nullable = true)
|-- field_4: double (nullable = true)
|-- field_5: double (nullable = true)
|-- field_6: double (nullable = true)
I would like to add one more StructField to the schema, so the new schema would looks like:
root
|-- field_1: double (nullable = true)
|-- field_1: double (nullable = true)
|-- field_2: double (nullable = true)
|-- field_3 (nullable = true)
|-- field_4: double (nullable = true)
|-- field_5: double (nullable = true)
|-- field_6: double (nullable = true)
I know I can manually create a new_schema like below:
new_schema = StructType([StructField("field_0", StringType(), True),
:
StructField("field_6", IntegerType(), True)])
This works for a small number of fields but couldn't generate if I have hundreds of fields. So I am wondering is there a more elegant way to add a new field to the beginning of the schema? Thanks!
You can copy existing fields and perpend:
to_prepend = [StructField("field_0", StringType(), True)]
StructType(to_prepend + df.schema.fields)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With