I'm looking for method to change pyspark dataframe column type
from
df.printSchema()
To
Thank you, for your help, in advance.
You have to replace the column with new schema. ArrayType take two parameters elementType and containsNull.
from pyspark.sql.types import *
from pyspark.sql.functions import udf
x = [("a",["b","c","d","e"]),("g",["h","h","d","e"])]
schema = StructType([StructField("key",StringType(), nullable=True),
StructField("values", ArrayType(StringType(), containsNull=False))])
df = spark.createDataFrame(x,schema = schema)
df.printSchema()
new_schema = ArrayType(StringType(), containsNull=True)
udf_foo = udf(lambda x:x, new_schema)
df.withColumn("values",udf_foo("values")).printSchema()
root
|-- key: string (nullable = true)
|-- values: array (nullable = true)
| |-- element: string (containsNull = false)
root
|-- key: string (nullable = true)
|-- values: array (nullable = true)
| |-- element: string (containsNull = true)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With