Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pyspark.sql.utils.AnalysisException: Parquet data source does not support void data type

I am trying to add a column in my dataframe df1 in PySpark.

The code I tried:

import pyspark.sql.functions as F
df1 = df1.withColumn("empty_column", F.lit(None))

But I get this error:

pyspark.sql.utils.AnalysisException: Parquet data source does not support void data type.

Can anyone help me with this?

like image 304
ar_mm18 Avatar asked Mar 29 '26 02:03

ar_mm18


1 Answers

Instead of just F.lit(None), use it with a cast and a proper data type. E.g.:

F.lit(None).cast('string')
F.lit(None).cast('double')

When we add a literal null column, it's data type is void:

from pyspark.sql import functions as F
spark.range(1).withColumn("empty_column", F.lit(None)).printSchema()
# root
#  |-- id: long (nullable = false)
#  |-- empty_column: void (nullable = true)

But when saving as parquet file, void data type is not supported, so such columns must be cast to some other data type.

like image 124
ZygD Avatar answered Mar 31 '26 06:03

ZygD



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!