Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Redshift with SSIS/SSDT

Has anyone been successful using Amazon Redshift as a source or destination ODBC component in SQL Server Data Tools 2012?

I've installed the PostgreSQL drivers provided by Amazon and have successfully tested a connection in the Windows ODBC driver administrator but keep running into arcane error messages when I choose my saved DSN and try to pull a table listing.

like image 700
Greg Avatar asked Oct 04 '22 17:10

Greg


1 Answers

Redshift is based on quite an old version of Postgres (8.0). Postgres has changed quite a bit since then and the Postgres tools have changed with it. When downloading any tools to use with Redshift you will probably need to use previous versions from several years ago.

The table listing problem is particularly annoying but I have yet to find a version of psql that can properly list Redshift tables. As an alternative you can use the INFORMATION_SCHEMA tables to find this kind of info, and in my opinion this is what SSIS/SSDT should be doing by default.

I would not expect SSIS to be able to load data into Redshift reliably, i.e. create a Redshift destination. This is because Redshift does not really support INSERT INTO as a way to load data. If you use INSERT INTO you will only be able to load ~10 rows per second. Redshift can only load data quickly from S3 or DynamoDB using the COPY command.

It's a similar story for all other ETL tools I've tried, notably the open source tools Pentaho PDI (aka Kettle) and Talend Open Studio. This is particularly annoying in Talend's case as they have Redshift components but they actually try to use INSERT INTO for loading. Even Amazon's own ETL tool Data Pipeline does not yet have support for Redshift as 'node'.

like image 188
Joe Harris Avatar answered Oct 07 '22 19:10

Joe Harris