I have an XML file converted to dataframe using spark-xml package. The dataframe has the following structure:
root
|-- results: struct (nullable = true)
| |-- result: struct (nullable = true)
| | |-- categories: struct (nullable = true)
| | | |-- category: array (nullable = true)
| | | | |-- element: struct (containsNull = true)
| | | | | |-- value: string (nullable = true)
if I select the category column(which may appear multiple times under categories):
df.select((col('results.result.categories.category')).alias("result_categories"))
For one record, The result will look like
[[result1], [result2]]
I'm trying to flatten the results:
[result1, result2]
When I use the flatten function, I'm getting an error message:
df.select(flatten(col('results.result.categories.category')).alias("Hits_Category"))
cannot resolve 'flatten(`results`.`result`.`categories`.`category`)' due to data type mismatch: The argument should be an array of arrays, but '`results`.`result`.`categories`.`category`' is of array<struct<value:string>
I end up creating a udf, and pass the column to the udf which spits out a flattened string version of the column.
Is there a better way?
You're trying to apply flatten
function for an array of structs while it expects an array of arrays:
flatten(arrayOfArrays)
- Transforms an array of arrays into a single array.
You don't need UDF, you can simply transform
the array elements from struct to array then use flatten
.
Something like this:
df.select(col('results.result.categories.category').alias("result_categories"))\
.withColumn("result_categories", expr("transform(result_categories, x -> array(x.*))"))\
.select(flatten(col("result_categories")).alias("Hits_Category"))\
.show()
You could use the following:
import pyspark.sql.functions as F
df.selectExpr("explode(results.result.categories.category) AS structCol").select(F.expr("concat_ws(',', structCol.*)").alias("single_col")).show()
Then the struct will be as string in one column and comma seperated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With