We do some aggregation on huge datasets in Amazon RedShift, and we have some relatively small amount of data in MySQL. For some of the joins in RedShift we need the data in MySQL. What is the best way to synchronize the MySql data to RedShift? Is there such a thing in redshift like the remote view in oracle? Or should I programatically query MySql and insert / update in RedShift?
Redshift now supports loading data from remote hosts via SSH. This technique involves:
The command specified by the manifest runs an arbitrary command that prints text output in a format suitable for ingest by the Redshift COPY command.
When MySQL data is required for joins in Redshift, we usually just send it over from one to another.
It implies:
Steps 2 to 4 can be scripted, and allow you to send fresh data over to Redshift when necessary or regularly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With