I have a large set of data on S3 in the form of a few hundred CSV files that are ~1.7 TB in total (uncompressed). I am trying to copy it to an empty table on a Redshift cluster.
The cluster is empty (no other tables) and has 10 dw2.large nodes. If I set a sort key on the table, the copy commands uses up all available disk space about 25% of the way through, and aborts. If there's no sort key, the copy completes successfully and never uses more than 45% of the available disk space. This behavior is consistent whether or not I also set a distribution key.
I don't really know why this happens, or if it's expected. Has anyone seen this behavior? If so, do you have any suggestions for how to get around it? One idea would be to try importing each file individually, but I'd love to find a way to let Redshift deal with that part itself and do it all in one query.
Amazon Redshift stores your data on disk in sorted order according to the sort key. The Amazon Redshift query optimizer uses sort order when it determines optimal query plans. When you use automatic table optimization, you don't need to choose the sort key of your table.
As records enter the system, Amazon Redshift transparently converts the data to columnar storage for each of the columns. In this simplified example, using columnar storage, each data block holds column field values for as many as three times as many records as row-based storage.
Amazon Redshift supports two different types of Sort Keys, Compound Sort Keys, and Interleaved Sort Keys. Selecting the right kind requires knowledge of the queries that you plan to execute.
Redshift allows designating multiple columns as SORTKEY columns, but most of the best-practices documentation is written as if there were only a single SORTKEY.
Got an answer to this from the Redshift team. The cluster needs free space of at least 2.5x the incoming data size to use as temporary space for the sort. You can upsize your cluster, copy the data, and resize it back down.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With