In AWS Athena you can write
SHOW CREATE TABLE my_table_name;
and see a SQL-like query that describes how to build the table's schema. It works for tables whose schema are defined in AWS Glue. This is very useful for creating tables in a regular RDBMS, for loading and exploring data views.
Interacting with Athena in this way is manual, and I would like to automate the process of creating regular RDBMS tables that have the same schema as those in Redshift Spectrum.
How can I do this through a query that can be run via psql
? Or is there another way to get this via the aws-cli
?
Amazon Redshift Spectrum uses external tables to query data that is stored in Amazon S3. You can query an external table using the same SELECT syntax you use with other Amazon Redshift tables. External tables are read-only. You can't write to an external table. You create an external table in an external schema.
Redshift Spectrum does not support SHOW CREATE TABLE syntax, but there are system tables that can deliver same information. I have to say, it's not as useful as the ready to use sql returned by Athena though. svv_external_columns - gives you the column names, types and order information.
If the Spectrum Table contains a Partition Key or Keys, Redshift automatically partitions new files and registers new partitions in the external catalog. CREATE EXTERNAL TABLE external_schema.table_name (column_name data_type [, …] ) [ PARTITIONED BY (col_name data_type [, …
With Amazon Redshift Spectrum, you can query data from Amazon Simple Storage Service (Amazon S3) without having to load data into Amazon Redshift tables. Amazon Redshift Spectrum processes any queries while the data remains in your Amazon S3 bucket.
Redshift Spectrum does not support SHOW CREATE TABLE
syntax, but there are system tables that can deliver same information. I have to say, it's not as useful as the ready to use sql returned by Athena though.
The tables are
svv_external_schemas
- gives you information about glue database mapping and IAM roles bound to itsvv_external_tables
- gives you the location information, and also data format and serdes usedsvv_external_columns
- gives you the column names, types and order information.Using that data, you could reconstruct the table's DDL.
For example to get the list of columns and their types in the CREATE TABLE
format one can do:
select distinct
listagg(columnname || ' ' || external_type, ',\n')
within group ( order by columnnum ) over ()
from svv_external_columns
where tablename = '<YOUR_TABLE_NAME>'
and schemaname = '<YOUR_SCHEM_NAME>'
the query give you the output similar to:
col1 int,
col2 string,
...
*) I am using listagg
window function and not the aggregate function, as apparently listagg
aggregate function can only be used with user defined tables. Bummer.
I had been doing something similar to @botchniaque's answer in the past, but recently stumbled across a solution in the AWS-Labs' amazon-redshift-utils code package that seems to be more reliable than my hand-spun queries:
amazon-redshift-utils: v_generate_external_tbl_ddl
If you don't have the ability to create a view backed with the ddl listed in that package, you can run it manually by removing the CREATE
statement from the start of the query. Assuming you can create it as a view, usage would be:
SELECT ddl
FROM admin.v_generate_external_tbl_ddl
WHERE schemaname = '<external_schema_name>'
-- Optionally include specific table references:
-- AND tablename IN ('<table_name_1>', '<table_name_2>', ..., '<table_name_n>')
ORDER BY tablename, seq
;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With