Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error while exploding a struct column in Spark

I have a dataframe whose schema looks like this:

event: struct (nullable = true)
|    | event_category: string (nullable = true)
|    | event_name: string (nullable = true)
|    | properties: struct (nullable = true)
|    |    | ErrorCode: string (nullable = true)
|    |    | ErrorDescription: string (nullable = true)

I am trying to explode the struct column properties using the following code:

df_json.withColumn("event_properties", explode($"event.properties"))

But it is throwing the following exception:

cannot resolve 'explode(`event`.`properties`)' due to data type mismatch: 
input to function explode should be array or map type, 
not StructType(StructField(IDFA,StringType,true),

How to explode the column properties?

like image 762
shiva.n404 Avatar asked Jan 18 '18 06:01

shiva.n404


People also ask

How do you explode a struct in Spark?

Problem: How to explode Array of StructType DataFrame columns to rows using Spark. Solution: Spark explode function can be used to explode an Array of Struct ArrayType(StructType) columns to rows on Spark DataFrame using scala example. Before we start, let's create a DataFrame with Struct column in an array.

What is a struct column PySpark?

The structtype provides the method of creation of data frame in PySpark. It is a collection or list of Struct Field Object. The structtype has the schema of the data frame to be defined, it contains the object that defines the name of the column, The type of the column, and the flag for each data frame.

What is struct in Spark SQL?

Spark SQL StructType & StructField classes are used to programmatically specify the schema to the DataFrame and creating complex columns like nested struct, array and map columns. StructType is a collection of StructField's.


Video Answer


1 Answers

You can use explode in an array or map columns so you need to convert the properties struct to array and then apply the explode function as below

import org.apache.spark.sql.functions._
df_json.withColumn("event_properties", explode(array($"event.properties.*"))).show(false)

You should have your desired requirement

like image 112
Ramesh Maharjan Avatar answered Oct 17 '22 13:10

Ramesh Maharjan