I like to set up tools and services with production, staging, and local development. I'd like to use Amazon Redshift, and starting at $180 a month seems pretty reasonable for a columnar store database, but do I actually have to think about it as $180 x # of environments / month? Is there any way to have a free staging and local environment for Redshift?
It's also nice to be able to do development against a local instance rather than relying on the network. I assume that's not possible with Redshift.
What do you do to make local development easier, faster and cheaper when working with Redshift?
Redshift is a service so you cannot run it locally unless you want to buy AWS Outposts.
The staging table is a temporary table that holds all of the data that will be used to make changes to the target table, including both updates and inserts. A merge operation requires a join between the staging table and the target table.
Data staging enables you to copy data from the input data node to the resource executing the activity, and, similarly, from the resource to the output data node. The staged data on the Amazon EMR or Amazon EC2 resource is available by using special variables in the activity's shell commands or Hive scripts.
Redshift is a type of OLAP database. On the other hand, OLTP databases are great for cases where your data is written to the database as often as it is being read from it. As the name suggests, a common use case for this is any transactional data.
Amazon Redshift was specifically created to run on AWS infrastructure. It is not available as a download. (Interestingly, Amazon DynamoDB does have a downloadable version for development purposes.)
The cheapest option might be to shutdown your Dev & Test instances each night and on weekends. Take a snapshot before deleting the cluster, then create a cluster the next morning based on the snapshot. This can be automated via the AWS Command-Line Interface (CLI), making it easy to schedule with cron or Scheduled Tasks.
You could also have a snapshot of Test data and restore that snapshot each morning, which means the test database doesn't fill-up with test cases.
Another cost saving might be to reduce the number of nodes for the non-production systems. Queries will run slower and the total amount of storage will be reduced, but it could be more cost-effective. Or even use a "Dense Storage" 2TB node instead of several "Dense Compute" SSD instances -- they will provide more storage on less nodes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With