Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SHOW PARTITIONS with order by in Amazon Athena

I have this query:

SHOW PARTITIONS tablename;

Result is:

dt=2018-01-12
dt=2018-01-20
dt=2018-05-21
dt=2018-04-07
dt=2018-01-03

This gives the list of partitions per table. The partition field for this table is dt which is a date column. I want to see the partitions ordered.

The documentation doesn't explain how to do it: https://docs.aws.amazon.com/athena/latest/ug/show-partitions.html

I tried to add order by:

SHOW PARTITIONS tablename order by dt;

But it gives:

AmazonAthena; Status Code: 400; Error Code: InvalidRequestException;

like image 934
jack Avatar asked Jun 07 '18 08:06

jack


People also ask

How do I display partitions in Athena?

To show the partitions in a table and list them in a specific order, see the Listing partitions for a specific table section on the Querying AWS Glue Data Catalog page. To view the contents of a partition, see the Query the data section on the Partitioning data in Athena page.

How many partitions can an Athena table have?

Although Athena supports querying AWS Glue tables that have 10 million partitions, Athena cannot read more than 1 million partitions in a single scan.

How can I see mysql partitions?

Using the SHOW TABLE STATUS statement to determine whether a table is partitioned. Querying the INFORMATION_SCHEMA. PARTITIONS table. Using the statement EXPLAIN SELECT to see which partitions are used by a given SELECT .

How can I see partitions in hive?

You can see Hive MetaStore tables,Partitions information in table of "PARTITIONS". You could use "TBLS" join "Partition" to query special table partitions.


Video Answer


2 Answers

AWS currently (as of Nov 2020) supports two versions of the Athena engines. How one selects and orders partitions depends upon which version is used.

Version 1:

Use the information_schema table. Assuming you have year, month as partitions (with one partition key, this is of course simpler):

WITH 
 a as (
SELECT partition_number as pn, partition_key as key, partition_value as val
FROM   information_schema.__internal_partitions__
WHERE  table_schema = 'my_database'
       AND table_name = 'my_table'
 )
SELECT 
  year, month
FROM (
    SELECT val as year, pn FROM a WHERE key = 'year'
) y
JOIN (
    SELECT val as month, pn FROM a WHERE key = 'month'
) m ON m.pn = y.pn
ORDER BY year, month

which outputs:

  year month
0 2018    10
0 2018    11
0 2018    12
0 2019    01
...

Version 2:

Use the built-in $partitions functionality, where the partitions are explicitly available as columns and the syntax is much simpler:

SELECT year, month FROM my_database."my_table$partitions" ORDER BY year, month
  year month
0 2018    10
0 2018    11
0 2018    12
0 2019    01
...

For more information, see:

https://docs.aws.amazon.com/athena/latest/ug/querying-glue-catalog.html#querying-glue-catalog-listing-partitions

like image 165
MikeGM Avatar answered Oct 09 '22 22:10

MikeGM


From your comment it sounds like you're looking to sort the partitions as a way to figure out whether or not a specific partition exists. For this purpose I suggest you use the Glue API instead of querying Athena. Run aws glue get-partition help or check your preferred SDK's documentation for how it works.

There is also a variant to list all partitions of a table, run aws glue get-partitions help to read more about that. I don't think it returns the partitions in alphabetical order, but it has operators for filtering.

like image 23
Theo Avatar answered Oct 09 '22 22:10

Theo