Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Athena equivalent to information_schema

For background, I come from a SQLServer background and make heavy use of the system tables & information_schema, to tell me all about my tables and columns.

I didn't expect the exact same power in Athena, but currently very shocked and frustrated with what little seems to be available - unless I've missed something ?

For example, 'describe mytable' - just describes 1 table at a time. How about showing the columns for ALL tables in one result ? It also does not output the table name, nor allow you to manually add that in as a custom column.

All the results of these "show/list/describe" commands seem to produce a text list - not a recordset, so you cannot take the results and join them to other tables or views to make more complex outputs.

Is there any other way to query the contents of my databases ?

Thanks in advance

like image 908
SimonB Avatar asked Jul 01 '19 22:07

SimonB


People also ask

What does Athena use under the hood?

Under the hood, Athena uses Presto to process DML statements and Hive to process the DDL statements that create and modify schema. With these technologies, there are a couple of conventions to follow so that Athena and AWS Glue work well together.

How do you get table names in Athena?

Want to get all above details in Athena and this is the approach I am following. SELECT table_name FROM information_schema. tables WHERE table_schema = 'logging' // Lists all the tables under logging schema.

How do I find the size of a table in Athena?

The quick way is via s3: ... > Show Properties > Location and lookup the size in the s3-console. You can run SELECT * FROM some_table for each table and look at the result metadata for the amount scanned, but it will be an expensive way to do it.

How do I get a table to DDL in Athena?

You can get the CREATE TABLE DDL statement from Athena, by calling StartQueryExecution() from your code, waiting for the query to complete and then downloading the results file or using the GetQueryResults() API. Athena uses the Glue Data Catalog as a Hive metastore.


1 Answers

Athena is based on Presto. Presto provides information_schema schema and I checked and it is accessible in Athena.

You can run e.g. a query like:

SELECT * FROM information_schema.columns;

to get a list of columns of all tables.

You can filter this by "database":

SELECT * FROM information_schema.columns WHERE table_schema = '<databasename>';

Note however that these types of queries are not necessarily very performant.

like image 149
Piotr Findeisen Avatar answered Sep 27 '22 17:09

Piotr Findeisen