For background, I come from a SQLServer background and make heavy use of the system tables & information_schema, to tell me all about my tables and columns.
I didn't expect the exact same power in Athena, but currently very shocked and frustrated with what little seems to be available - unless I've missed something ?
For example, 'describe mytable' - just describes 1 table at a time. How about showing the columns for ALL tables in one result ? It also does not output the table name, nor allow you to manually add that in as a custom column.
All the results of these "show/list/describe" commands seem to produce a text list - not a recordset, so you cannot take the results and join them to other tables or views to make more complex outputs.
Is there any other way to query the contents of my databases ?
Thanks in advance
Under the hood, Athena uses Presto to process DML statements and Hive to process the DDL statements that create and modify schema. With these technologies, there are a couple of conventions to follow so that Athena and AWS Glue work well together.
Want to get all above details in Athena and this is the approach I am following. SELECT table_name FROM information_schema. tables WHERE table_schema = 'logging' // Lists all the tables under logging schema.
The quick way is via s3: ... > Show Properties > Location and lookup the size in the s3-console. You can run SELECT * FROM some_table for each table and look at the result metadata for the amount scanned, but it will be an expensive way to do it.
You can get the CREATE TABLE DDL statement from Athena, by calling StartQueryExecution() from your code, waiting for the query to complete and then downloading the results file or using the GetQueryResults() API. Athena uses the Glue Data Catalog as a Hive metastore.
Athena is based on Presto. Presto provides information_schema
schema and I checked and it is accessible in Athena.
You can run e.g. a query like:
SELECT * FROM information_schema.columns;
to get a list of columns of all tables.
You can filter this by "database":
SELECT * FROM information_schema.columns WHERE table_schema = '<databasename>';
Note however that these types of queries are not necessarily very performant.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With