Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to execute scheduled SQL script on Amazon Redshift?

I have series of ~10 queries to be executed every hour automatically in Redshift (maybe report success/failure).

Most queries are aggregation on my tables.

I have tried using AWS Lambda with CloudWatch Events, but Lambda functions only survive for 5 minutes max and my queries can take up to 25 minutes.

like image 977
Rahul Gupta Avatar asked Mar 02 '17 20:03

Rahul Gupta


People also ask

How do I run a Redshift query in AWS?

To use the query editor on the Amazon Redshift console On the navigation menu, choose Query editor, then connect to a database in your cluster. For Schema, choose public to create a new table based on that schema. Enter the following in the query editor window and choose Run to create a new table.

Which node is used to run SQL queries in Amazon Redshift?

The Query Editor provides an in-browser interface for running SQL queries on Amazon Redshift clusters. For queries that are run on compute nodes, you can then view the query results and query execution plan next to your queries.


2 Answers

i had the same problem in the past,

you can use R or Python for that.

i used R , you can install package RpostgreSQL and connecting to your Redshift attached example:

drv <- dbDriver("PostgreSQL")
conn <-dbConnect(drv,host='mm-stats-1.ctea4hmr4vlw.us-east-1.redshift.amazonaws.com',port='5439',dbname='stats',user='xxx',password='yyy')

and then you can build report with markdown and then scheduled it with crontab task.

also i used mailR package to send the report to other users

like image 90
user3600910 Avatar answered Nov 07 '22 19:11

user3600910


It's kind of strange that AWS doesn't provide a simple distributed cron style service. It would be useful for so many things. There is SWF, but the timing/scheduling aspect is left up to the user. You could use Lambda/Cloudwatch to trigger SWF events. That's a lot of overhead to get reasonable cron like activity.

Like the comment says the easiest way would be to run a small instance and host cron jobs there. Use an autoscale group of 1 for some reliability. A similar but more complicated approach is to use elastic beanstalk.

If you really want redundancy, reliability, visibility, etc. it might be worth looking at a third party solution like Airflow. There are many others depending on your language of preference.

Here's a similar question with more info.

like image 26
systemjack Avatar answered Nov 07 '22 21:11

systemjack