Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add empty map type column to DataFrame?

I want to add a new map type column to a dataframe, like this:

|-- cMap: map (nullable = true)
|    |-- key: string
|    |-- value: string (valueContainsNull = true)

I tried the code:

df.withColumn("cMap", lit(null).cast(MapType)).printSchema

The error is :

<console>:132: error: overloaded method value cast with alternatives:
(to: String)org.apache.spark.sql.Column <and>
(to: org.apache.spark.sql.types.DataType)org.apache.spark.sql.Column
cannot be applied to (org.apache.spark.sql.types.MapType.type)

Is there other way to cast the new column to Map or MapType? Thanks

like image 698
Pingjiang Li Avatar asked May 28 '17 04:05

Pingjiang Li


People also ask

How do I create a column map in spark?

2.1 Using Spark DataTypes. We can create a map column using createMapType() function on the DataTypes class. This method takes two arguments keyType and valueType as mentioned above and these two arguments should be of a type that extends DataType.

What is MapType?

In PySpark, MapType (also called map type) is the data type which is used to represent the Python Dictionary (dict) to store the key-value pair that is a MapType object which comprises of three fields that are key type (a DataType), a valueType (a DataType) and a valueContainsNull (a BooleanType).

What is withColumn in PySpark?

In PySpark, the withColumn() function is widely used and defined as the transformation function of the DataFrame which is further used to change the value, convert the datatype of an existing column, create the new column etc.

What is meant by PySpark MapType How can you create a MapType using StructType?

Solution: PySpark provides a create_map() function that takes a list of column types as an argument and returns a MapType column, so we can use this to convert the DataFrame struct column to map Type. struct is a type of StructType and MapType is used to store Dictionary key-value pair.


1 Answers

I had the same problem, finally I found solution:

df.withColumn("cMap", typedLit(Map.empty[String, String])) 

From ScalaDocs for typedLit:

The difference between this function and [[lit]] is that this function can handle parameterized scala types e.g.: List, Seq and Map.

like image 129
bartholomaios Avatar answered Sep 29 '22 14:09

bartholomaios