how to change pyspark data frame column data type?

Question

I'm looking for method to change pyspark dataframe column type

from

df.printSchema()

enter image description here

To

enter image description here

Thank you, for your help, in advance.

pauli · Accepted Answer

You have to replace the column with new schema. ArrayType take two parameters elementType and containsNull.

from pyspark.sql.types import *
from pyspark.sql.functions import udf
x = [("a",["b","c","d","e"]),("g",["h","h","d","e"])]
schema = StructType([StructField("key",StringType(), nullable=True),
                     StructField("values", ArrayType(StringType(), containsNull=False))])

df = spark.createDataFrame(x,schema = schema)
df.printSchema()
new_schema = ArrayType(StringType(), containsNull=True)
udf_foo = udf(lambda x:x, new_schema)
df.withColumn("values",udf_foo("values")).printSchema()



root
 |-- key: string (nullable = true)
 |-- values: array (nullable = true)
 |    |-- element: string (containsNull = false)

root
 |-- key: string (nullable = true)
 |-- values: array (nullable = true)
 |    |-- element: string (containsNull = true)

how to change pyspark data frame column data type?

Tags:

casting

dataframe

pyspark

user2763088

1 Answers

pauli

Recent Activity

Donate For Us

how to change pyspark data frame column data type?

Tags:

casting

dataframe

pyspark

user2763088

1 Answers

pauli

Related questions

Recent Activity

Donate For Us