I have a Python script that uses psycopg2 to execute a COPY command to copy data from S3 to Redshift, this is running fine on a cron schedule.
Now I want to do some checks that the data has loaded properly each time and want to query the STL_LOAD_COMMITS and STL_LOAD_ERRORS tables.
Does anyone know if there is a way of getting the query ID returned from the COPY command so it can be used to query the tables above and retrieve the relevant log record?
I don't believe COPY returns anything at all, but if someone has come across some clever way of getting checking loads in code I'd be interested.
EDIT: Perhaps the right way to do this is to query using the filename instead of the query ID since I know the names of the files I've loaded.
select *
from STL_LOAD_COMMITS
where filename in ('s3://bucket/4f737c05-8f16-4ba7-8f50-30423369c389.csv.gz',
's3://bucket/5fe4fea9-a9e4-4622-b9f6-ed3f98f7d1e2.csv.gz')
Using PG_LAST_COPY_ID() will, as it suggests, return the last executed COPY query ID.
Source AWS Redshift PG_LAST_COPY_ID()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With