Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does posexplode fail with "AnalysisException: The number of aliases supplied in the AS clause does not match the number of columns..."?

This is my dataframe:

+------------------------------------------
|value   
+------------------------------------------
|[0.0, 1.0, 0.0, 7.0000000000000036, 0.0]
|[2.0000000000000036, 0.0, 2.9999999999999996, 4.0000000000000036, 5.000000000000002]
|[4.000000000000006, 0.0, 0.0, 6.000000000000006, 7.000000000000004]  
+------------------------------------------

When I use:

dataFrame.withColumn("item_id", posexplode(dataFrame.col("value")))

I got this error:

org.apache.spark.sql.AnalysisException: The number of aliases supplied in the AS clause does not match the number of columns output by the UDTF expected 2 aliases but got item_id ;

So, how to use posexplode to "Creates a new row for each element with position in the given array or map column."

like image 366
Pi Pi Avatar asked Jan 03 '18 08:01

Pi Pi


4 Answers

Things are clear if you'd use explode in .withColumn().

In contrast, before low-prioritized Spark-20174 gets accepted and implemented, the use of posexplode along with withColumn isn't straight forward. You may want to use a workaround based on selectExpr as shown below.

val df = Seq(
  ("a", Seq(1,2,3)),
  ("b", Seq(11,22))).toDF("n", "s")
df show

+---+---------+
|  n|        s|
+---+---------+
|  a|[1, 2, 3]|
|  b| [11, 22]|
+---+---------+
df selectExpr("*", "posexplode(s) as (p,c)") drop("s") show

+---+---+---+
|  n|  p|  c|
+---+---+---+
|  a|  0|  1|
|  a|  1|  2|
|  a|  2|  3|
|  b|  0| 11|
|  b|  1| 22|
+---+---+---+
like image 58
vak Avatar answered Nov 20 '22 14:11

vak


You can use posexplode with the select as below

dataframe.select($"value", posexplode($"value")).show(false)

Which returns two new columns as pos and col

Hope this helps!

like image 37
koiralo Avatar answered Nov 20 '22 13:11

koiralo


posexplode will create two new columns, one with the value and one with the index. You can use:

dataFrame.select(posexplode($"value") as Seq("pos", "val"))

This will give you a dataframe containing the indices and values.


However, if you only want the actual value it's better to use explode:

dataFrame.withColumn("val", explode($"value"))
like image 3
Shaido Avatar answered Nov 20 '22 13:11

Shaido


posexplode expects 2 reference names to represent the index its creating and actual value that its extracting from your array in each row.

What you can do is use 'Lateral View' after registering your table or view (differ with spark version you are using) like:

select ind, val from table_name LATERAL VIEW posexplode(values) exploded_values as ind, val

I haven't tried this as I am at workplace but you can definitely try this out.

like image 1
Avik Aggarwal Avatar answered Nov 20 '22 14:11

Avik Aggarwal