This is my dataframe:
+------------------------------------------
|value
+------------------------------------------
|[0.0, 1.0, 0.0, 7.0000000000000036, 0.0]
|[2.0000000000000036, 0.0, 2.9999999999999996, 4.0000000000000036, 5.000000000000002]
|[4.000000000000006, 0.0, 0.0, 6.000000000000006, 7.000000000000004]
+------------------------------------------
When I use:
dataFrame.withColumn("item_id", posexplode(dataFrame.col("value")))
I got this error:
org.apache.spark.sql.AnalysisException: The number of aliases supplied in the AS clause does not match the number of columns output by the UDTF expected 2 aliases but got item_id ;
So, how to use posexplode to "Creates a new row for each element with position in the given array or map column."
Things are clear if you'd use explode
in .withColumn()
.
In contrast, before low-prioritized Spark-20174 gets accepted and implemented, the use of posexplode
along with withColumn
isn't straight forward. You may want to use a workaround based on selectExpr
as shown below.
val df = Seq(
("a", Seq(1,2,3)),
("b", Seq(11,22))).toDF("n", "s")
df show
+---+---------+
| n| s|
+---+---------+
| a|[1, 2, 3]|
| b| [11, 22]|
+---+---------+
df selectExpr("*", "posexplode(s) as (p,c)") drop("s") show
+---+---+---+
| n| p| c|
+---+---+---+
| a| 0| 1|
| a| 1| 2|
| a| 2| 3|
| b| 0| 11|
| b| 1| 22|
+---+---+---+
You can use posexplode
with the select
as below
dataframe.select($"value", posexplode($"value")).show(false)
Which returns two new columns as pos
and col
Hope this helps!
posexplode
will create two new columns, one with the value and one with the index. You can use:
dataFrame.select(posexplode($"value") as Seq("pos", "val"))
This will give you a dataframe containing the indices and values.
However, if you only want the actual value it's better to use explode
:
dataFrame.withColumn("val", explode($"value"))
posexplode expects 2 reference names to represent the index its creating and actual value that its extracting from your array in each row.
What you can do is use 'Lateral View' after registering your table or view (differ with spark version you are using) like:
select ind, val from table_name LATERAL VIEW posexplode(values) exploded_values as ind, val
I haven't tried this as I am at workplace but you can definitely try this out.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With