In AWS Athena you can write <code>SHOW CREATE TABLE my_table_name;</code> and see a SQL-like query that describes how to build the table's schema. It works for tables whose schema are defined in AWS Glue. This is very useful for creating tables in a regular RDBMS, for loading and exploring data views. Interacting with Athena in this way is manual, and I would like to automate the process of creating regular RDBMS tables that have the same schema as those in Redshift Spectrum. How can I do this through a query that can be run via <code>psql</code>? Or is there another way to get this via the <code>aws-cli</code>?

Redshift Spectrum does not support <code>SHOW CREATE TABLE</code> syntax, but there are system tables that can deliver same information. I have to say, it's not as useful as the ready to use sql returned by Athena though. The tables are <ul> <li> <code>svv_external_schemas</code> - gives you information about glue database mapping and IAM roles bound to it</li> <li> <code>svv_external_tables</code> - gives you the location information, and also data format and serdes used</li> <li> <code>svv_external_columns</code> - gives you the column names, types and order information.</li> </ul> Using that data, you could reconstruct the table's DDL. For example to get the list of columns and their types in the <code>CREATE TABLE</code> format one can do: <pre class="prettyprint"><code>select distinct listagg(columnname || ' ' || external_type, ',\n') within group ( order by columnnum ) over () from svv_external_columns where tablename = '<YOUR_TABLE_NAME>' and schemaname = '<YOUR_SCHEM_NAME>' </code></pre> the query give you the output similar to: <pre class="prettyprint"><code>col1 int, col2 string, ... </code></pre> *) I am using <code>listagg</code> window function and not the aggregate function, as apparently <code>listagg</code> aggregate function can only be used with user defined tables. Bummer.

I had been doing something similar to @botchniaque's answer in the past, but recently stumbled across a solution in the AWS-Labs' amazon-redshift-utils code package that seems to be more reliable than my hand-spun queries: amazon-redshift-utils: v_generate_external_tbl_ddl If you don't have the ability to create a view backed with the ddl listed in that package, you can run it manually by removing the <code>CREATE</code> statement from the start of the query. Assuming you can create it as a view, usage would be: <pre class="prettyprint lang-sql prettyprint-override"><code>SELECT ddl FROM admin.v_generate_external_tbl_ddl WHERE schemaname = '<external_schema_name>' -- Optionally include specific table references: -- AND tablename IN ('<table_name_1>', '<table_name_2>', ..., '<table_name_n>') ORDER BY tablename, seq ; </code></pre>

Is there a way to describe an external/spectrum table via redshift?

Tags:

ddl

amazon-redshift

amazon-redshift-spectrum

In AWS Athena you can write

SHOW CREATE TABLE my_table_name;

and see a SQL-like query that describes how to build the table's schema. It works for tables whose schema are defined in AWS Glue. This is very useful for creating tables in a regular RDBMS, for loading and exploring data views.

Interacting with Athena in this way is manual, and I would like to automate the process of creating regular RDBMS tables that have the same schema as those in Redshift Spectrum.

How can I do this through a query that can be run via psql? Or is there another way to get this via the aws-cli?

219

asked Dec 02 '19 21:12

New Alexandria

2 Answers

Redshift Spectrum does not support SHOW CREATE TABLE syntax, but there are system tables that can deliver same information. I have to say, it's not as useful as the ready to use sql returned by Athena though.

The tables are

svv_external_schemas - gives you information about glue database mapping and IAM roles bound to it
svv_external_tables - gives you the location information, and also data format and serdes used
svv_external_columns - gives you the column names, types and order information.

Using that data, you could reconstruct the table's DDL.

For example to get the list of columns and their types in the CREATE TABLE format one can do:

select distinct
       listagg(columnname || ' ' || external_type, ',\n') 
             within group ( order by columnnum ) over ()
from svv_external_columns
where tablename = '<YOUR_TABLE_NAME>'
and schemaname = '<YOUR_SCHEM_NAME>'

the query give you the output similar to:

col1 int, 
col2 string,
...

*) I am using listagg window function and not the aggregate function, as apparently listagg aggregate function can only be used with user defined tables. Bummer.

145

answered Nov 10 '22 14:11

botchniaque

I had been doing something similar to @botchniaque's answer in the past, but recently stumbled across a solution in the AWS-Labs' amazon-redshift-utils code package that seems to be more reliable than my hand-spun queries:

amazon-redshift-utils: v_generate_external_tbl_ddl

If you don't have the ability to create a view backed with the ddl listed in that package, you can run it manually by removing the CREATE statement from the start of the query. Assuming you can create it as a view, usage would be:

SELECT ddl
FROM admin.v_generate_external_tbl_ddl
WHERE schemaname = '<external_schema_name>'
    -- Optionally include specific table references:
    --     AND tablename IN ('<table_name_1>', '<table_name_2>', ..., '<table_name_n>')
ORDER BY tablename, seq
;

answered Nov 10 '22 14:11

John Stark

Related questions
                            
                                Copy to Redshift from another accounts S3 bucket
                            
                                Unload data from postgres to s3
                            
                                Load Parquet files into Redshift
                            
                                AZ64 compression format performance
                            
                                Insert Zipped File into RedShift
                            
                                Why do I get "Your account does not support the EC2-Classic Platform in this region."?
                            
                                Amazon Redshift: Copying Data Between Databases
                            
                                Using Sequelize with Redshift
                            
                                Export data from Amazon Redshift as JSON
                            
                                How do you UNLOAD data to S3 from Redshift and include a date in the filename
                            
                                Hive -- split data across files
                            
                                Error while using regexp_split_to_table (Amazon Redshift)
                            
                                How should records be formatted for AWS Kinesis Firehose to Redshift?
                            
                                Redshift Performance of Flat Tables Vs Dimension and Facts
                            
                                Remove all duplicates from Redshift database
                            
                                Redshift UPDATE prohibitively slow
                            
                                Connect IntelliJ to Amazon Redshift
                            
                                How to do de-duplication on records from AWS Kinesis Firehose to Redshift?
                            
                                Redshift Error 1202 "Extra column(s) found" using COPY command
                            
                                Redshift error Overflow for NUMERIC(8,4)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With