Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Explode column with array of arrays - PySpark

I have a column with data like this:

[[[-77.1082606, 38.935738]] ,Point] 

I want it split out like:

  column 1          column 2        column 3
 -77.1082606      38.935738           Point

How is that possible using PySpark, or alternatively Scala (Databricks 3.0)? I know how to explode columns but not split up these structs. Thanks!!!

EDIT: Here is the schema of the column:

|-- geometry: struct (nullable = true)
 |    |-- coordinates: string (nullable = false)
 |    |-- type: string (nullable = false
like image 903
Ashley O Avatar asked Dec 31 '25 18:12

Ashley O


1 Answers

You can use regexp_replace() to get rid of the square brackets, and then split() the resulting string by the comma into separate columns.

from pyspark.sql.functions import regexp_replace, split, col

df.select(regexp_replace(df.geometry.coordinates, "[\[\]]", "").alias("coordinates"),
          df.geometry.type.alias("col3")) \
  .withColumn("arr", split(col("coordinates"), "\\,")) \
  .select(col("arr")[0].alias("col1"),
          col("arr")[1].alias("col2"),
         "col3") \
  .drop("arr") \
  .show(truncate = False)
+-----------+----------+-----+
|col1       |col2      |col3 |
+-----------+----------+-----+
|-77.1082606| 38.935738|Point|
+-----------+----------+-----+
like image 71
mtoto Avatar answered Jan 03 '26 08:01

mtoto



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!