Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I migrate to Redshift?

I'm currently struggling querying be chunk of data that is stored in partitioned table (partition per date)

the data looks like that:

date, product_id, orders
2019-11-01, 1, 100
2019-11-01, 2, 200
2019-11-02, 1, 300

I have hundreds of date-partitions and millions of rows per date.

Now, if I want to query, for instance, total orders for product id 1 and 2 for period of 2 weeks, and group by date (to show in a graph per date), the db has to go to 2 weeks of partitions and fetch the data for them.

That process might be taking a long time when the number of products is big or the time frame required is long.

I have read that AWS Redshift is suitable for this kind of tasks. I'm considering shifting my partitioned tables (aggregated analytics per date) to that technology but I wonder if that's really what I should do to make those queries to run much faster.

Thanks!

like image 235
Yuval Kaufman Avatar asked Mar 03 '23 03:03

Yuval Kaufman


1 Answers

As per your use case Redshift is really a good choice for you. To gain the best performance out of Redshift, it is very important to set proper distribution and sort key. In your case "date" column should be distribution key and "productid" should be sort key. Another important note, Do not encode "date" and "productid" column. You should get better performance.

like image 149
Jahirul Islam Avatar answered Mar 12 '23 20:03

Jahirul Islam