I am trying to add a column in my dataframe df1 in PySpark.
The code I tried:
import pyspark.sql.functions as F
df1 = df1.withColumn("empty_column", F.lit(None))
But I get this error:
pyspark.sql.utils.AnalysisException: Parquet data source does not support void data type.
Can anyone help me with this?
Instead of just F.lit(None), use it with a cast and a proper data type. E.g.:
F.lit(None).cast('string')
F.lit(None).cast('double')
When we add a literal null column, it's data type is void:
from pyspark.sql import functions as F
spark.range(1).withColumn("empty_column", F.lit(None)).printSchema()
# root
# |-- id: long (nullable = false)
# |-- empty_column: void (nullable = true)
But when saving as parquet file, void data type is not supported, so such columns must be cast to some other data type.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With