Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to do a SQL dump from Amazon Redshift

Is there a way to do a SQL dump from Amazon Redshift?

Could you use the SQL workbench/J client?

like image 809
Elm Avatar asked Mar 15 '13 19:03

Elm


People also ask

How do I extract data from Amazon Redshift?

The first method of extracting data from AWS Redshift through SQL involves transfers to Amazon S3 files, a part of Amazon web services. You can run the process by unloadingAWS data into S3 buckets and using SSIS (SQL Server Integration Services) for copying data into SQL servers.

Is Amazon Redshift a SQL database?

Amazon Redshift is built around industry-standard SQL, with added functionality to manage very large datasets and support high-performance analysis and reporting of those data.

Can you query Redshift with SQL?

Amazon Redshift supports SQL client tools connecting through Java Database Connectivity (JDBC) and Open Database Connectivity (ODBC). Amazon Redshift doesn't provide or install any SQL client tools or libraries, so you must install them on your client computer or Amazon EC2 instance to use them.


3 Answers

pg_dump of schemas may not have worked in the past, but it does now.

pg_dump -Cs -h my.redshift.server.com -p 5439 database_name > database_name.sql

CAVEAT EMPTOR: pg_dump still produces some postgres specific syntax, and also neglects the Redshift SORTKEY and DISTSTYLE definitions for your tables.

Another decent option is to use the published AWS admin script views for generating your DDL. It handles the SORTKEY/DISTSTYLE, but I've found it to be buggy when it comes to capturing all FOREIGN KEYs, and doesn't handle table permissions/owners. Your milage may vary.

To get a dump of the data itself, you still need to use the UNLOAD command on each table unfortunately.

Here's a way to generate it. Be aware that select * syntax will fail if your destination table does not have the same column order as your source table:

select
  ist.table_schema,
  ist.table_name,
  'unload (''select col1,col2,etc from "' || ist.table_schema || '"."' || ist.table_name || '"'')
to ''s3://SOME/FOLDER/STRUCTURE/' || ist.table_schema || '.' || ist.table_name || '__''
credentials ''aws_access_key_id=KEY;aws_secret_access_key=SECRET''
delimiter as '',''
gzip
escape
addquotes
null as ''''
--encrypted
--parallel off
--allowoverwrite
;'
from information_schema.tables ist
where ist.table_schema not in ('pg_catalog')
order by ist.table_schema, ist.table_name
;
like image 67
mattmc3 Avatar answered Oct 01 '22 17:10

mattmc3


We are currently using Workbench/J successfuly with Redshift.

Regarding dumps, at the time there is no schema export tool available in Redshift (pg_dump doesn't work), although data can always be extracted via queries.

Hope to help.

EDIT: Remember that things like sort and distribution keys are not reflected on the code generated by Workbench/J. Take a look to the system table pg_table_def to see info on every field. It states if a field is sortkey or distkey, and such info. Documentation on that table:

http://docs.aws.amazon.com/redshift/latest/dg/r_PG_TABLE_DEF.html

like image 31
nandilugio Avatar answered Oct 01 '22 16:10

nandilugio


Yes, you can do so via several ways.

  1. UNLOAD() to an S3 Bucket- Thats the best. You can get your data on almost any other machine. (More info here: http://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html)

  2. Pipe the contents of your table to a data file using the Linux instance you have. So, running:

    $> psql -t -A -F 'your_delimiter' -h 'hostname' -d 'database' -U 'user' -c "select * from myTable" >> /home/userA/tableDataFile will do the trick for you.

like image 40
Utsav Jha Avatar answered Oct 01 '22 17:10

Utsav Jha