I'm doing something among the lines of:
conn_string = "postgresql+pg8000://%s:%s@%s:%d/%s" % (db_user, db_pass, host, port, schema)
conn = sqlalchemy.engine.create_engine(conn_string,execution_options={'autocommit':True},encoding='utf-8',isolation_level="AUTOCOMMIT")
rows = cur.execute(sql_query)
To run queries on a Redshift cluster. Lately, I've been doing maintenance tasks such as running vacuum reindex
on large tables that get truncated and reloaded every day.
The problem is that that command above takes around 7 minutes for a particular table (the table is huge, 60 million rows across 15 columns) and when I run it using the method above it just never finishes and hangs. I can see in the cluster dashboard in AWS that parts of the vacuum command are being run for about 5 minutes and then it just stops. No python errors, no errors on the cluster, no nothing.
My guess is that the connection is lost during the command. So, how do I prove my theory? Anybody else with the issue? What do I change the connection string to keep it alive longer?
EDIT:
I change my connection this after the comments here:
conn = sqlalchemy.engine.create_engine(conn_string,
execution_options={'autocommit': True},
encoding='utf-8',
connect_args={"keepalives": 1, "keepalives_idle": 60,
"keepalives_interval": 60},
isolation_level="AUTOCOMMIT")
And it has been working for a while. However, it decided to start with the same behaviour for even larger tables in which the vacuum reindex
actually takes around 45 minutes (at least that is my estimate, the command never finishes running in Python).
How can I make this work regardless of the query runtime?
It's most likely not a connection drop issue. To confirm this , try pushing a few million rows into a dummy table (something which takes more than 5 minutes) and see if the statement fails. Once a query has been submitted to redshift , regardless of your connection string shutting the query executes in the background.
Now, coming to the problem itself - my guess is that you are running out of memory or disk space, can you please be more elaborate and list out your redshift setup (How many nodes of dc1/ds2) ? Also, try running some admin queries and see how much space you have left on the disk. Sometimes when the cluster is loaded to the brim a disk full error is thrown but in your case since the connection might be dropped much before the error is thrown to your python shell.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With