I am looking to Copy data within databases on Amazon Redshift. Before this, I was copying data from a Redshift database to a PostgreSQL hosted on an EC2 instance for analytical purpose. I had ruby script that would do it using dblink EXTENSION.
But now since the data is un-managable on PostgreSQL instance, we have decided to copy the data to a separate database, on the same Redshift cluster.
Bump! What I found was that dblink is not available for Postgre 8.0 (Redshift Version). Is there a way I can get this working on Redshift.
I don't want to use the Copy command and add a layer of S3 in between or even data-pipeline. You see, it's the same cluster, just different databases.
In case, there is no other way, shall I use data-pipeline or Copy to S3 and then back to Redshift. Or is it essentially the same in the end?
P.S. Any help/leads would be appreciated. I would do my hard work. Just need a heads up.
This seems now possible (since June 3, 2016)
See: https://blogs.aws.amazon.com/bigdata/post/Tx1GQ6WLEWVJ1OX/JOIN-Amazon-Redshift-AND-Amazon-RDS-PostgreSQL-WITH-dblink
In the above article, they give an example to copy from redshift to an other database:
I'll annotate with (postgres cluster) and (redshift cluster) for clarity
Copying data from Amazon Redshift to RDS PostgreSQL
The SQL to create the table: (postgres cluster)
CREATE TABLE sales_summary (
saletime timestamp,
sellerid int,
sales decimal
);
[...]
The following query shows how to insert data into the tables using the dblink function:
copy is from (redshift cluster) to (postgres cluster)
INSERT INTO sales_summary
SELECT *
FROM dblink('foreign_server',$REDSHIFT$
SELECT date_trunc('hours', saletime) AS ts, sellerid, sum(pricepaid) sales
FROM sales
WHERE saletime >= '2008-01-01'
AND saletime < '2008-02-01'
GROUP BY ts, sellerid
ORDER BY sales
$REDSHIFT$) AS t1 (saletime timestamp, sellerid int, sales decimal);
Edit: I've used this in production to load tables with many millions of rows daily, and I haven't had a single issue related to it. So I would recommend it as the preferred method for loading from Redshift to RDS-postgres (As opposed to COPYing, or worse, loading the data in memory with an ORM)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With