What is exactly "$path" used for? I just ran "select "$path" from table limit 10", in athena it's showing the file path of S3 where data is pointed. But when i gave limit 10, it's showing same path 10 times, if i don't limit the statement it's scanning entire data. Can please someone expalin.
Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. On the other hand, Presto is detailed as "Distributed SQL Query Engine for Big Data".
Athena engine version 2 is based on Presto 0.217 . For information about related functions, operators, and expressions, see Presto 0.217 functions and operators and the following specific sections from the Presto documentation.
When you use CREATE_TABLE , Athena defines a STRUCT in it, populates it with data, and creates the ROW data type for you, for each row in the dataset. The underlying ROW data type consists of named fields of any supported SQL data types.
"$path"
is a pseudo-column which evaluates to the path of the source file given row comes from. This is provided by Presto's Hive connector. If you have a file with 100 rows, you will get same path 100 times.
If you want to get first ten distinct paths, you should try
select DISTINCT "$path" from table limit 10
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With