Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to describe an external/spectrum table via redshift?

In AWS Athena you can write

SHOW CREATE TABLE my_table_name;

and see a SQL-like query that describes how to build the table's schema. It works for tables whose schema are defined in AWS Glue. This is very useful for creating tables in a regular RDBMS, for loading and exploring data views.

Interacting with Athena in this way is manual, and I would like to automate the process of creating regular RDBMS tables that have the same schema as those in Redshift Spectrum.

How can I do this through a query that can be run via psql? Or is there another way to get this via the aws-cli?

like image 219
New Alexandria Avatar asked Dec 02 '19 21:12

New Alexandria


People also ask

What is an external table in Amazon Redshift spectrum?

Amazon Redshift Spectrum uses external tables to query data that is stored in Amazon S3. You can query an external table using the same SELECT syntax you use with other Amazon Redshift tables. External tables are read-only. You can't write to an external table. You create an external table in an external schema.

Does redshift spectrum support SHOW CREATE TABLE syntax?

Redshift Spectrum does not support SHOW CREATE TABLE syntax, but there are system tables that can deliver same information. I have to say, it's not as useful as the ready to use sql returned by Athena though. svv_external_columns - gives you the column names, types and order information.

How do I partition a spectrum table in redshift?

If the Spectrum Table contains a Partition Key or Keys, Redshift automatically partitions new files and registers new partitions in the external catalog. CREATE EXTERNAL TABLE external_schema.table_name (column_name data_type [, …] ) [ PARTITIONED BY (col_name data_type [, …

What is Amazon Redshift spectrum and how does it work?

With Amazon Redshift Spectrum, you can query data from Amazon Simple Storage Service (Amazon S3) without having to load data into Amazon Redshift tables. Amazon Redshift Spectrum processes any queries while the data remains in your Amazon S3 bucket.


2 Answers

Redshift Spectrum does not support SHOW CREATE TABLE syntax, but there are system tables that can deliver same information. I have to say, it's not as useful as the ready to use sql returned by Athena though.

The tables are

  • svv_external_schemas - gives you information about glue database mapping and IAM roles bound to it
  • svv_external_tables - gives you the location information, and also data format and serdes used
  • svv_external_columns - gives you the column names, types and order information.

Using that data, you could reconstruct the table's DDL.

For example to get the list of columns and their types in the CREATE TABLE format one can do:

select distinct
       listagg(columnname || ' ' || external_type, ',\n') 
             within group ( order by columnnum ) over ()
from svv_external_columns
where tablename = '<YOUR_TABLE_NAME>'
and schemaname = '<YOUR_SCHEM_NAME>'

the query give you the output similar to:

col1 int, 
col2 string,
...

*) I am using listagg window function and not the aggregate function, as apparently listagg aggregate function can only be used with user defined tables. Bummer.

like image 145
botchniaque Avatar answered Nov 10 '22 14:11

botchniaque


I had been doing something similar to @botchniaque's answer in the past, but recently stumbled across a solution in the AWS-Labs' amazon-redshift-utils code package that seems to be more reliable than my hand-spun queries:

amazon-redshift-utils: v_generate_external_tbl_ddl

If you don't have the ability to create a view backed with the ddl listed in that package, you can run it manually by removing the CREATE statement from the start of the query. Assuming you can create it as a view, usage would be:

SELECT ddl
FROM admin.v_generate_external_tbl_ddl
WHERE schemaname = '<external_schema_name>'
    -- Optionally include specific table references:
    --     AND tablename IN ('<table_name_1>', '<table_name_2>', ..., '<table_name_n>')
ORDER BY tablename, seq
;
like image 33
John Stark Avatar answered Nov 10 '22 14:11

John Stark