Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add one more StructField to schema

My PySpark data frame has the following schema:

schema = spark_df.printSchema()
root
 |-- field_1: double (nullable = true)
 |-- field_2: double (nullable = true)
 |-- field_3 (nullable = true)
 |-- field_4: double (nullable = true)
 |-- field_5: double (nullable = true)
 |-- field_6: double (nullable = true)

I would like to add one more StructField to the schema, so the new schema would looks like:

root
 |-- field_1: double (nullable = true)
 |-- field_1: double (nullable = true)
 |-- field_2: double (nullable = true)
 |-- field_3 (nullable = true)
 |-- field_4: double (nullable = true)
 |-- field_5: double (nullable = true)
 |-- field_6: double (nullable = true)

I know I can manually create a new_schema like below:

new_schema = StructType([StructField("field_0", StringType(), True),
                            :
                         StructField("field_6", IntegerType(), True)])

This works for a small number of fields but couldn't generate if I have hundreds of fields. So I am wondering is there a more elegant way to add a new field to the beginning of the schema? Thanks!

like image 769
Edamame Avatar asked Sep 18 '16 18:09

Edamame


1 Answers

You can copy existing fields and perpend:

to_prepend = [StructField("field_0", StringType(), True)] 

StructType(to_prepend + df.schema.fields)
like image 116
zero323 Avatar answered Oct 20 '22 10:10

zero323