Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pyspark add empty literal map of type string

Similar to this question I want to add a column to my pyspark DataFrame containing nothing but an empty map. If I use the suggested answer from that question, however, the type of the map is <null,null>, unlike in the answer posted there.

from pyspark.sql.functions import create_map
spark.range(1).withColumn("test", create_map()).printSchema()

root
 |-- test: map(nullable = false)
 |    |-- key: null
 |    |-- value: null (valueContainsNull = false)

I need an empty <string,string> map. I can do it in Scala like so:

import org.apache.spark.sql.functions.typedLit
spark.range(1).withColumn("test", typedLit(Map[String, String]())).printSchema()

root
 |-- test: map(nullable = false)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)

How can I do it in pyspark? I am using Spark 3.01 with underlying Scala 2.12 on Databricks Runtime 7.3 LTS. I need the <string,string> map because otherwise I can't save my Dataframe to parquet:

AnalysisException: Parquet data source does not support map<null,null> data type.;
like image 207
Alarik Avatar asked Nov 16 '25 10:11

Alarik


1 Answers

You can cast the map to the appropriate type creating the map using create_map.


from pyspark.sql.functions import create_map
spark.range(1).withColumn("test", create_map().cast("map<string,string>")).printSchema()

root
 |-- id: long (nullable = false)
 |-- test: map (nullable = false)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)
like image 195
Nithish Avatar answered Nov 18 '25 19:11

Nithish



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!