Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Athena/Presto: Getting maximum partition value, at cheapest scan cost

I am wanting to get the maximum value from a partition of my Athena table. Given that the volume of scanned data is cost, am seeking a way to do this with minimum scan.

Admittedly, I have little data in there now but will grow over time once in production.

Does anyone know about what happens under the hood for these 2 approaches, how they differ, and which would be the most efficient?

Thanks

Method (1)

SELECT max(dt) 
FROM mydb.mytable 

-- Console Output: -- Time in queue:0.166 sec Run time:3.153 sec Data scanned:-

Method (2)

SELECT max(dt) 
FROM mydb."mytable$partitions" 

-- Console Output: -- Time in queue:0.223 sec Run time:1.347 sec Data scanned:0.02 KB

like image 899
SimonB Avatar asked Oct 28 '25 05:10

SimonB


1 Answers

Very very very late answer, but this question helped me a lot so I look it up, maybe it can help others:

SHOW PARTITIONS lists the partitions in metadata.

If you want to execute a SHOW PARTITIONS on a query you use:

SELECT * FROM "table_name$partitions"

The second example you posted it's faster because it doesn't look into the filesystem (S3) but only into the metadata.

AWS Documentation: https://docs.aws.amazon.com/athena/latest/ug/show-partitions.html

like image 63
Alejandro Avatar answered Oct 30 '25 14:10

Alejandro



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!