I have a RedShift cluster of 4 nodes.
Thanks a lot!
In case of node failure(s), Amazon Redshift automatically provisions new node(s) and begins restoring data from other drives within the cluster or from Amazon S3. It prioritizes restoring your most frequently queried data so your most frequently executed queries will become performant quickly.
Because Amazon Redshift distributes and runs queries in parallel across all of a cluster's compute nodes, you can increase query performance by adding nodes to your cluster.
Fault tolerant: There are multiple features that enhance the reliability of your data warehouse cluster. For example, Amazon Redshift continuously monitors the health of the cluster and automatically re-replicates data from failed drives and replaces nodes as necessary for fault tolerance.
If it's a single node failure - amazon will start a new node and stream data from other nodes (each block is written to two different nodes if any). In such case, we can expect:
In case more than one nodes fails, redshift will restore itself from the latest S3 backup. S3 backups are done on the following occasions:
It just happened to my cluster - one of nodes failed. It took almost 20 minutes to get noticed in the dashboard (unhealthy was shown in 'Performance' tab, but healthy in 'Status' tab).
After 1h from initial failure, cluster changed its state to 'modifying' and after another 1h a new node was in place.
There is a message in 'Recent Events':
A node on Amazon Redshift cluster 'xxx' was automatically replaced at 2013-12-18 11:42 UTC. The cluster is now operating normally.
For the whole time cluster was unavailable - no queries were run, no imports were possible.
Data is exactly the same as in the moment of a failure.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With