This information is very condensed.
There are 2 tables.
article
-----------------------------------
|id | weight | text |
-----------------------------------
|1 | 10 | blah |
|2 | 100 | blah |
|3 | 50 | blah |
|4 | 1000 | blah |
-----------------------------------
read
-----------------------------------
| user_id | article_id |
-----------------------------------
| 1 | 4 |
| 1 | 2 |
| 1 | 3 |
| 2 | 3 |
| 2 | 4 |
-----------------------------------
I want to get unread articles using below query (very condensed)
SELECT
a.*
FROM
article a LEFT OUTER JOIN read r ON r.article_id = a.id and r.user_id = 1
WHERE
r.id IS NULL
ORDER BY
a.weight DESC
LIMIT 10
important information
- the number of
read table
rows keeps under 1000 per user. (remove old data)- weight column in article table is changed frequently. (It means order not fixed)
problem is .. (when number of users : over 1M)
read table
rows will be over 1GIt works well so far (current # of read table rows : 100M). but I have to prepare next step because number of users is increasing rapidly.
What is the best way for large service in this case?
(sharding? partitioning table? or redesign architecture?)
Thanks in advance
article
. It will be a flag saying whether the article is read/unread. (Do not make it a user count or a timestamp; that will slowdown the subsequent steps.)A problem: Since you are purging (after 1000), some "read" articles can become "unread". To deal with this, batch the purging, and gather the distinct list of articles that got purged. Then do the tedious task of re-computing the flag, but just for those articles. INDEX(article_id)
will help; use EXISTS ( SELECT * FROM read WHERE article_id = $aid )
. (This can probably be turned into a batch operation rather than one aid at a time.)
Another problem: secondary keys on billion-row tables are costly -- they may lead to a lot of I/O. Before attempting to address this problem, please provide SHOW CREATE TABLE
for both tables, plus any other common SELECTs
. Picking the right index(es) and datatypes is very important to performance in billion-row tables..
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With