I want to add a new map type column to a dataframe, like this:
|-- cMap: map (nullable = true)
| |-- key: string
| |-- value: string (valueContainsNull = true)
I tried the code:
df.withColumn("cMap", lit(null).cast(MapType)).printSchema
The error is :
<console>:132: error: overloaded method value cast with alternatives:
(to: String)org.apache.spark.sql.Column <and>
(to: org.apache.spark.sql.types.DataType)org.apache.spark.sql.Column
cannot be applied to (org.apache.spark.sql.types.MapType.type)
Is there other way to cast the new column to Map or MapType? Thanks
2.1 Using Spark DataTypes. We can create a map column using createMapType() function on the DataTypes class. This method takes two arguments keyType and valueType as mentioned above and these two arguments should be of a type that extends DataType.
In PySpark, MapType (also called map type) is the data type which is used to represent the Python Dictionary (dict) to store the key-value pair that is a MapType object which comprises of three fields that are key type (a DataType), a valueType (a DataType) and a valueContainsNull (a BooleanType).
In PySpark, the withColumn() function is widely used and defined as the transformation function of the DataFrame which is further used to change the value, convert the datatype of an existing column, create the new column etc.
Solution: PySpark provides a create_map() function that takes a list of column types as an argument and returns a MapType column, so we can use this to convert the DataFrame struct column to map Type. struct is a type of StructType and MapType is used to store Dictionary key-value pair.
I had the same problem, finally I found solution:
df.withColumn("cMap", typedLit(Map.empty[String, String]))
From ScalaDocs for typedLit
:
The difference between this function and [[lit]] is that this function can handle parameterized scala types e.g.: List, Seq and Map.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With