I have a very basic group by query in Athena where I would like to use an alias. One can make the example work by putting the same reference in the group by, but that's not really handy when there's complex column modifications going on and logic needs to be copied in two places. Also I did that in the past and now I have a statement that doesn't work by copying over. Problem: <pre class="prettyprint lang-sql prettyprint-override"><code>SELECT substr(accountDescriptor, 5) as account, sum(revenue) as grossRevenue FROM sales GROUP BY account </code></pre> This will throw an error: <blockquote> alias Column 'account' cannot be resolved </blockquote> The following works, so it's about the alias handling. <pre class="prettyprint lang-sql prettyprint-override"><code>SELECT substr(accountDescriptor, 5) as account, sum(revenue) as grossRevenue FROM sales GROUP BY substr(accountDescriptor, 5) </code></pre>

In addition to answers from kokosing and Gordon Linoff, you can use numbers that represent the location of the grouped column name in the <code>SELECT</code> statement. Such approach can also provide you with better performance as described in section 8 of this AWS Blog. For example: <pre class="prettyprint lang-sql prettyprint-override"><code>SELECT substr(accountDescriptor, 5) as account, sum(revenue) as grossRevenue FROM sales, GROUP BY 1 </code></pre> Note: numbering starts from one and not from zero. Here <code>1</code> is somewhat aliased to <code>account</code>. The main obvious downside is that if you change ordering of you columns within <code>SELECT</code> than you would also need to account for that within <code>GROUP BY</code>: <pre class="prettyprint lang-sql prettyprint-override"><code>SELECT sum(revenue) as grossRevenue, substr(accountDescriptor, 5) as account FROM sales, GROUP BY 2 </code></pre>

AWS Athena ALIAS in Group By does not get resolved

Tags:

alias

sql

hive

amazon-athena

presto

I have a very basic group by query in Athena where I would like to use an alias. One can make the example work by putting the same reference in the group by, but that's not really handy when there's complex column modifications going on and logic needs to be copied in two places. Also I did that in the past and now I have a statement that doesn't work by copying over.

Problem:

SELECT 
    substr(accountDescriptor, 5) as account, 
    sum(revenue) as grossRevenue 
FROM sales 
GROUP BY account

This will throw an error:

alias Column 'account' cannot be resolved

The following works, so it's about the alias handling.

SELECT 
    substr(accountDescriptor, 5) as account, 
    sum(revenue) as grossRevenue 
FROM sales 
GROUP BY substr(accountDescriptor, 5)

207

asked Feb 10 '20 00:02

supernova

Video Answer

2 Answers

That is because SQL is evaluated in certain order, like table scan, filter, aggregation, projection, sort. You tried to use the result of projection as input of aggregation. In many cases it could be possible (where projection is trivial, like your case), but it such behaviour is not defined in ANSI SQL (which Presto and so Athena follows).

We see that in many cases it is very useful so, support for this might be added in future (extending ANSI SQL).

Currently, there are several ways to overcome this:

SELECT account, sum(revenue) as grossRevenue 
FROM (SELECT substr(accountDescriptor, 5) as account, revenue FROM sales)
GROUP BY account

WITH better_sales AS (SELECT substr(accountDescriptor, 5) as account, revenue FROM sales)
SELECT account, sum(revenue) as grossRevenue 
FROM better_sales
GROUP BY account

SELECT account, sum(revenue) as grossRevenue 
FROM sales
LATERAL JOIN (SELECT substr(accountDescriptor, 5) as account)
GROUP BY account

SELECT substr(accountDescriptor, 5) as account, sum(revenue) as grossRevenue
FROM sales
GROUP BY 1;

118

answered Nov 05 '22 22:11

kokosing

In addition to answers from kokosing and Gordon Linoff, you can use numbers that represent the location of the grouped column name in the SELECT statement. Such approach can also provide you with better performance as described in section 8 of this AWS Blog. For example:

SELECT
    substr(accountDescriptor, 5) as account,
    sum(revenue) as grossRevenue
FROM sales,
GROUP BY 1

Note: numbering starts from one and not from zero.

Here 1 is somewhat aliased to account. The main obvious downside is that if you change ordering of you columns within SELECT than you would also need to account for that within GROUP BY:

SELECT
    sum(revenue) as grossRevenue,
    substr(accountDescriptor, 5) as account
FROM sales,
GROUP BY 2

answered Nov 05 '22 23:11

Ilya Kisil

Related questions
                            
                                SQL Server connection context using temporary table cannot be used in stored procedures called with SqlDataAdapter.Fill
                            
                                Transactions are auto committed on PostgreSQL 9.5.2 with no option to change it?
                            
                                Using WITH + DELETE clause in a single query in postgresql
                            
                                declare variable in sql (hive)
                            
                                Rename all columns from all tables with specific column name in PostgreSQL?
                            
                                Connect to Vertica from Datagrip
                            
                                ActiveRecord find position (index) of record within relation based off value of attribute
                            
                                Delete all records in table which have no reference in another table
                            
                                Snowflake subquery
                            
                                How to combine date from one column and time from another
                            
                                Entity Framework and Default Values
                            
                                Forcing left join to only return one row from matching Ids in the right table
                            
                                Illegal mix of collations (utf8_unicode_ci,IMPLICIT) and (utf8_general_ci,IMPLICIT)
                            
                                remove all NULL valued rows from table?
                            
                                What can I use other than Group By?
                            
                                MySQL - Split two columns into two different rows
                            
                                Postgresql Update with join
                            
                                Insert empty string in Oracle [duplicate]
                            
                                Big query distinct on and group by
                            
                                Google Query - "NOT LIKE" Statement Doesn't work

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With