Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select rows by index in Amazon Athena

This is a very simple question but I can't seem to find documentation on it. How one would query rows by index (ie select the 10th through 20th row in a table)?

I know there's a row_numbers function but it doesn't seem to do what I want.

like image 370
Ajjit Narayanan Avatar asked Jun 28 '18 19:06

Ajjit Narayanan


People also ask

How do I index an Athena table?

Using partition indexes with Athena is a simple, two-step process. Start by selecting the columns you want to index from the Glue Data Catalog and start index creation. Next, enable partition filtering on your tables and return to Athena to run your query.

How do I query RDS from Athena?

Configure RDS as Data Source. You configure PostgreSQL RDS instance as the data source for Amazon Athena so that you can query RDS data from the Athena Query Editor. Goto Athena Management console and click on Data sources link. On the next screen, click on the Connect data source button.

Can Athena query across databases?

Amazon Athena now enables users to run SQL queries across data stored in relational, non-relational, object, and custom data sources.


2 Answers

Do not specify any partition so your row number will be an integer between 1 and your number of record.

SELECT  row_num FROM (
  SELECT row_number() over () as row_num
FROM your_table 
  )
  WHERE row_num between 100000 and 100010
like image 145
woshitom Avatar answered Oct 18 '22 23:10

woshitom


I seem to have found a roundabout and clunky way of doing this in Athena, so any better answers are welcome. This approach requires you have some numeric column in your table already, in this case named some_numeric_column:

SELECT some_numeric_column, row_num FROM (
  SELECT some_numeric_column,
row_number() over (order by some_numeric_column) as row_num
FROM your_table 
  )
  WHERE row_num between 100000 and 100010

To explain, you first select some numeric column in your data, then create a column (called row_num) of row numbers which is based on the order of your selected numeric column. Then you wrap that all in a select call because Athena doesn't support creating and then conditioning on the row_num column within a single call. If you don't wrap it in a second SELECT call Athena will spit out some errors about not finding a column named row_num.

like image 5
Ajjit Narayanan Avatar answered Oct 18 '22 21:10

Ajjit Narayanan