I have a redshift table and it is storing a lot of data. Every weekend I go and manually using Workbench TRUNCATE last week of data that I no longer need. I manually have to run
DELETE FROM tableName WHERE created_date BETWEEN timeStamp1 AND timeStamp2;
Is it possible to have some way to tell the table or have some expiration policy that removes the data every Sunday for me?
If not, Is there a way to automate the delete process every 7 days? Some sort of shell script or cron job in nodeJS that does this.
Usage notes. The TRUNCATE command commits the transaction in which it is run; therefore, you can't roll back a TRUNCATE operation, and a TRUNCATE command may commit other operations when it commits itself.
To delete rows in a Redshift table, use the DELETE FROM statement: DELETE FROM products WHERE product_id=1; The WHERE clause is optional, but you'll usually want it, unless you really want to delete every row from the table.
Files that you remove by using the CLEANPATH option are permanently deleted and can't be recovered. You can't specify the CLEANPATH option if you specify the ALLOWOVERWRITE option. By default, UNLOAD writes data in parallel to multiple files, according to the number of slices in the cluster.
Amazon Redshift automatically sorts data and runs VACUUM DELETE in the background. This lessens the need to run the VACUUM command. For more information, see Vacuuming tables. By default, VACUUM skips the sort phase for any table where more than 95 percent of the table's rows are already sorted.
No, there is no in-built ability to run commands on a regular basis on Amazon Redshift. You could, however, run a script on another system that connects to Redshift and runs the command.
For example, a cron
job that calls psql
to connect to Redshift and execute the command. This could be done in a one-line script.
Alternatively, you could configure an AWS Lambda function to connect to Redshift and execute the command. (You would need to write the function yourself, but there are libraries that make this easier.) Then, you would configure Amazon CloudWatch Events to trigger the Lambda function on a desired schedule (eg once a week).
A common strategy is to actually store data in separate tables per time period (eg a month, but in your case it would be a week). Then, define a view that combines several tables. To delete a week of data, simply drop the table that contains that week of data, create a new table for this week's data, then update the view to point to the new table but not the old table.
By the way...
Your example uses the DELETE
command, which is not the same as the TRUNCATE
command.
TRUNCATE
removes all data from a table. It is an efficient way to completely empty a table.
DELETE
is good for removing part of a table but it simply marks rows as deleted. The data still occupies space on disk. Therefore, it is recommended that you VACUUM
the table after deleting a significant quantity of data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With