How to give alias name for posexplode columns in Spark SQL?

Tags:

The below statement generates "pos" and "col" as default names when I use posexplode() function in Spark SQL

scala> spark.sql(""" with t1(select to_date('2019-01-01') first_day) select first_day,date_sub(add_months(first_day,1),1) last_day, posexplode(array(5,6,7)) from t1 """).show(false)
+----------+----------+---+---+
|first_day |last_day  |pos|col|
+----------+----------+---+---+
|2019-01-01|2019-01-31|0  |5  |
|2019-01-01|2019-01-31|1  |6  |
|2019-01-01|2019-01-31|2  |7  |
+----------+----------+---+---+

What is the syntax to override those default names in spark.sql?. In dataframes, this can be done by giving df.explode(select 'arr.as(Seq("arr_val","arr_pos")))

scala> val arr= Array(5,6,7)
arr: Array[Int] = Array(5, 6, 7)

scala> Seq(("dummy")).toDF("x").select(posexplode(lit(arr)).as(Seq("arr_val","arr_pos"))).show(false)
+-------+-------+
|arr_val|arr_pos|
+-------+-------+
|0      |5      |
|1      |6      |
|2      |7      |
+-------+-------+

how to get that in SQL? I tried

spark.sql(""" with t1(select to_date('2011-01-01') first_day) select first_day,date_sub(add_months(first_day,1),1) last_day, posexplode(array(5,6,7)) as(Seq('p','c')) from t1 """).show(false)

and

spark.sql(""" with t1(select to_date('2011-01-01') first_day) select first_day,date_sub(add_months(first_day,1),1) last_day, posexplode(array(5,6,7)) as(('p','c')) from t1 """).show(false)

but they are throwing error.

511

asked Jan 22 '19 13:01

stack0114106

1 Answers

You can either use LATERAL VIEW:

spark.sql("""
  WITH t1 AS (SELECT to_date('2011-01-01') first_day)
  SELECT first_day, date_sub(add_months(first_day,1),1) last_day, p, c
  FROM t1
  LATERAL VIEW  posexplode(array(5,6,7)) AS p, c
""").show

+----------+----------+---+---+
| first_day|  last_day|  p|  c|
+----------+----------+---+---+
|2011-01-01|2011-01-31|  0|  5|
|2011-01-01|2011-01-31|  1|  6|
|2011-01-01|2011-01-31|  2|  7|
+----------+----------+---+---+

or a tuple of aliases

spark.sql("""
  WITH t1 AS (SELECT to_date('2011-01-01') first_day)
  SELECT first_day, date_sub(add_months(first_day,1),1) last_day,
         posexplode(array(5,6,7)) AS (p, c) 
  FROM t1 
""").show

+----------+----------+---+---+
| first_day|  last_day|  p|  c|
+----------+----------+---+---+
|2011-01-01|2011-01-31|  0|  5|
|2011-01-01|2011-01-31|  1|  6|
|2011-01-01|2011-01-31|  2|  7|
+----------+----------+---+---+

Tested with Spark 2.4.0.

Please note that aliases are not strings, and shouldn't be quoted with ' or ". If you have to use non-standard identifiers you should use backticks, i.e.

WITH t1 AS (SELECT to_date('2011-01-01') first_day)
SELECT first_day, date_sub(add_months(first_day,1),1) last_day,
       posexplode(array(5,6,7)) AS (`arr pos`, `arr_value`) 
FROM t1

198

answered Oct 17 '22 02:10

user10938362

Related questions
                            
                                not in operator in sql server not working
                            
                                Find max, min, avg, percentile of count(*) per mmdd PostgreSQL
                            
                                Ms Sql Server Compare Numeric column with string value
                            
                                Sql Only String search
                            
                                How to grant Select on ALL tables in ALL databases on a server?
                            
                                Mapping joined tables of same type in JOOQ
                            
                                SQL Server : Using recursive CTE to resolve group membership
                            
                                How to add columns to a query without the need to put them in the group by?
                            
                                Save Python data-frame as Table in Teradata
                            
                                BigQuery - Best way to DROP date-sharded tables
                            
                                Subtract two columns from two tables with Group By
                            
                                output clause VS triggers
                            
                                Simple Eloquent query taking too long time to execute
                            
                                Select the first record of the last group when there are repeating groups
                            
                                BigQuery SQL, append SQL query result to existing table
                            
                                What is this type of join called where the joins are qualified at the end?
                            
                                What is a good Visio Enterprise Architect replacement? [closed]
                            
                                How to select rows matching multiple columns from a list of tuples?
                            
                                Anybody using SQL Source Control from Red Gate
                            
                                Select multiple rows with the same value(s)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to give alias name for posexplode columns in Spark SQL?

Tags:

sql

apache-spark

apache-spark-sql

stack0114106

People also ask

1 Answers

user10938362

Recent Activity

Donate For Us