Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Redshift disk space vs number of nodes

I am currently using AWS Redshift service to store data. The data size is about to hit 100% of disk space.

  1. Will adding nodes and changing from Single-node to Multi-nodes increase the disk size?

  2. Is moving from dc1.xlarge to bigger nodes such as dc1.8xlarge the only way to increase the disk space?

  3. If I move to Multi-nodes, will the data be split or just mirrored so that both nodes will have the same data?

like image 432
Aung Myint Thein Avatar asked Jan 09 '17 13:01

Aung Myint Thein


1 Answers

Redshift is a distributed columnar data warehouse solution. The key here is "distributed". Unlike traditional databases, Redshift is designed to scale out by adding nodes to the cluster. Adding nodes adds disk space as well as computing horsepower. To answer your questions:

  1. Will adding nodes and changing from Single-node to Multi-nodes increase the disk size?

    Generally speaking, yes. When storing data in Redshift, you should choose a distribution key (column or set of columns) that will evenly distribute your data across different nodes. As a general principle, you should use the same set of columns for your distribution key across all your tables. Note that Tables configured to use a distribution style of all will get replicated across all nodes; limit using dist style all to dimension tables only.

  2. Is moving from dc1.xlarge to bigger nodes such as dc1.8xlarge the only way to increase the disk space?

    No; see answer to question 1 above. There are different types of nodes that you can choose from depending on your requirement. DC1 are compute optimized nodes; they have smaller but faster SSD drives. DS1 nodes will provide you with significantly higher disk space per node.

  3. If I move to Multi-nodes, will the data be split or just mirrored so that both nodes will have the same data?

    See answer to Q1 above - when you add nodes to your Redshift cluster, Redshift will re-distribute your data across all nodes as specified in the distribution style for each of your tables.

PS: I would highly recommend reading through Redshift documentation. Start at Are You a First-Time Amazon Redshift User?

References: Choosing a Data Distribution Style

like image 78
DotThoughts Avatar answered Sep 26 '22 20:09

DotThoughts