Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is $path pseudo column? What is the use of it in Athena (Presto)?

What is exactly "$path" used for? I just ran "select "$path" from table limit 10", in athena it's showing the file path of S3 where data is pointed. But when i gave limit 10, it's showing same path 10 times, if i don't limit the statement it's scanning entire data. Can please someone expalin.

like image 220
Roy Avatar asked Feb 12 '19 20:02

Roy


People also ask

What is the difference between Presto and Athena?

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. On the other hand, Presto is detailed as "Distributed SQL Query Engine for Big Data".

Which version of Presto does Athena use?

Athena engine version 2 is based on Presto 0.217 . For information about related functions, operators, and expressions, see Presto 0.217 functions and operators and the following specific sections from the Presto documentation.

What is struct data type in Athena?

When you use CREATE_TABLE , Athena defines a STRUCT in it, populates it with data, and creates the ROW data type for you, for each row in the dataset. The underlying ROW data type consists of named fields of any supported SQL data types.


1 Answers

"$path" is a pseudo-column which evaluates to the path of the source file given row comes from. This is provided by Presto's Hive connector. If you have a file with 100 rows, you will get same path 100 times.

If you want to get first ten distinct paths, you should try

select DISTINCT "$path" from table limit 10
like image 70
Piotr Findeisen Avatar answered Nov 15 '22 05:11

Piotr Findeisen