Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract all elements from array of structs?

I have a Dataframe with different columns and where one of the column is an array of structs:

+----------+---------+--------------------------------------+
|id        |title    |                                values|
+----------+---------+--------------------------------------+
|        1 | aaa     |      [{name1, id1}, {name2, id2},...]|
|        2 | bbb     |  [{name11, id11}, {name22, id22},...]|

my df schema for this column looks like this:

|-- values: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- name: string (nullable = true)
|    |    |-- id: long (nullable = true)

I want to extract each value from this array column like this:

+----------+---------+--------------+
|id        |title    |name   |   _id|
+----------+---------+--------------+
|        1 | aaa     | name1 |  id1 |
|        1 | aaa     | name2 |  id2 |
|        2 | bbb     | name11| id11 |
|        2 | bbb     | name22| id22 |

I figured out how to extract the single item of the array:

df = df.withColumn("name", df["values"].getItem(0).name)\
.withColumn("_id", df["id"].getItem(0).id)\

but I don't know the way how to apply it for the whole length of array. I probably should do something like:

for index in range(len(df.values)):
    df = df.withColumn("name", df["values"].getItem(index).name)\
    .withColumn("_id", df["id"].getItem(index).id)\

could you please help me to solve it ? Thanks!

like image 950
LDropl Avatar asked Dec 14 '22 16:12

LDropl


1 Answers

Just explode and select

from pyspark.sql.functions import col, explode

df.withColumn("values", explode("values")).select(
  "*", col("values")["name"].alias("name"), col("values")["id"].alias("id")
)
like image 122
user9954377 Avatar answered Jan 03 '23 05:01

user9954377