Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unload data from postgres to s3

I'm trying to unload a table data from postgres database into amazon s3.

I'm aware that redshift has a option of unload into s3 - Since redshift is a postgres database, I tried using the same command in my postgres database but was unsuccesful.

Can someone help me with unloading table data from postgres into s3 periodically ?

like image 503
Firstname Avatar asked Mar 10 '17 06:03

Firstname


People also ask

What is unload in AWS?

PDFRSS. Unloads the result of a query to one or more text, JSON, or Apache Parquet files on Amazon S3, using Amazon S3 server-side encryption (SSE-S3). You can also specify server-side encryption with an AWS Key Management Service key (SSE-KMS) or client-side encryption with a customer managed key.

Can Postgres read from S3?

Your database must be running PostgreSQL version 10.7 or higher to import from Amazon S3 into RDS for PostgreSQL. If you don't have data stored on Amazon S3, you need to first create a bucket and store the data. For more information, see the following topics in the Amazon Simple Storage Service User Guide.


2 Answers

Redshift is based on a PostgreSQL clone but there's not 1-1 feature correspondence. If you want to load data from a PostgreSQL DB to Redshift, through S3, you should:

  1. Unload your data from PostgreSQL to a CSV file. To do that use the copy command of psql. See also this Question here.
  2. Copy the CSV file on S3. There are different ways to do that but check the documentation here
  3. Use the COPY command to load the data from S3 to Redshift
like image 142
cpard Avatar answered Oct 23 '22 09:10

cpard


On Redshift you can create a table to receive the data:

CREATE TABLE redshift_schema.redshift_table (...);

Then create a foreign data wrapper, server and a virtual phantom of the table in PostgreSQL RDS:

CREATE EXTENSION redshift_fdw;

----optional
--CREATE FOREIGN DATA WRAPPER redshift_fdw
--HANDLER postgres_fdw_handler
--VALIDATOR postgres_fdw_validator
--OPTIONS ();

CREATE SERVER redshift_server_mydb
FOREIGN DATA WRAPPER redshift_fdw
OPTIONS (dbname 'mydb', port '5439', connect_timeout '200000', host 'myhost.redshift.amazonaws.com');

CREATE USER MAPPING FOR mypguser
SERVER redshift_server_mydb
OPTIONS (user 'myrsuser', password 'mypassword');

IMPORT FOREIGN SCHEMA redshift_schema 
LIMIT TO (redshift_table) 
FROM SERVER redshift_server_mydb
INTO postgresql_schema;

Now in PostgreSQL you can (in a function if you like) load (select, insert, update, delete) the Redshift table from the PostgreSQL table (without using dblink):

INSERT INTO postgresql_schema.redshift_table
SELECT *
FROM postgresql_schema.postgresql_table;

Now when you look at the Redshift table all the data is there and you can UNLOAD the table to S3 as required.

like image 39
S Wright Avatar answered Oct 23 '22 08:10

S Wright