Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Boto3 athena query without saving data to s3

I am trying to use boto3 to run a set of queries and don't want to save the data to s3. Instead I just want to get the results and want to work with those results. I am trying to do the following

import boto3

client = boto3.client('athena')
response = client.start_query_execution(
        QueryString='''SELECT * FROM mytable limit 10''',
        QueryExecutionContext={
            'Database': 'my_db'
            }.        
        ResultConfiguration={
            'OutputLocation': 's3://outputpath',
            }
        )

print(response)

But here I don't want to give ResultConfiguration because I don't want to write the results anywhere. But If I remove the ResultConfiguration parameter I get the following error

botocore.exceptions.ParamValidationError: Parameter validation failed:
Missing required parameter in input: "ResultConfiguration"

So it seems like giving s3 output location for writing is mendatory. So what could the way to avoid this and get the results only in response?

like image 326
muazfaiz Avatar asked Oct 24 '18 10:10

muazfaiz


People also ask

Can Athena directly query from S3?

Athena can query Amazon S3 Inventory files in ORC, Parquet, or CSV format. When you use Athena to query inventory, we recommend that you use ORC-formatted or Parquet-formatted inventory files. ORC and Parquet formats provide faster query performance and lower query costs.

What's difference between S3 query and Athena query?

So Whats the Difference Between S3 Select and Athena? S3 Select is a lightweight solution designed to let you use SQL to perform simple SELECT clauses on a maximum of one file. Amazon Athena is an analytics workhorse that allows you to perform SQL on extremely large datasets spanning many files with great performance.

Does Athena cache query results?

Amazon Athena automatically stores query results and metadata information for each query that runs in a query result location that you can specify in Amazon S3. If necessary, you can access the files in this location to work with them.

How do you make Athena queries faster?

Compress and split files You can speed up your queries dramatically by compressing your data, provided that files are splittable or of an optimal size (optimal S3 file size is between 200MB-1GB). Smaller data sizes mean less network traffic between Amazon S3 to Athena.


1 Answers

You will have to specify an S3 temp bucket location whenever running the 'start_query_execution' command. However, you can get a result set (a dict) by running the 'get_query_results' method using the query id.

The response (dict) will look like this:

{
'UpdateCount': 123,
'ResultSet': {
    'Rows': [
        {
            'Data': [
                {
                    'VarCharValue': 'string'
                },
            ]
        },
    ],
    'ResultSetMetadata': {
        'ColumnInfo': [
            {
                'CatalogName': 'string',
                'SchemaName': 'string',
                'TableName': 'string',
                'Name': 'string',
                'Label': 'string',
                'Type': 'string',
                'Precision': 123,
                'Scale': 123,
                'Nullable': 'NOT_NULL'|'NULLABLE'|'UNKNOWN',
                'CaseSensitive': True|False
            },
        ]
    }
},
'NextToken': 'string'
}

For more information, see boto3 client doc: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/athena.html#Athena.Client.get_query_results

You can then delete all files in the S3 temp bucket you've specified.

like image 95
Ishay Hilzenrat Avatar answered Sep 21 '22 21:09

Ishay Hilzenrat