I'm creating a game with points for doing little things, so I have a schema as such:
create table points (
id int,
points int,
reason varchar(10)
)
and to get the number of points a user has is trivial:
select sum(points) as total from points where id = ?
however, performance has become more and more of an issue as the points table expand. I want to do something like:
create table pointtotal (
id int,
totalpoints int
)
what is the best practice for keeping them in sync? Do I try to update pointtotal on every change? Do I run a daily script?
(Assume I have the right keys - they were left out for conciseness)
Edit:
Here are some characteristics that I left out but should be helpful:
Inserts/Updates to Points are not all that frequent There are a large number of entries, and there are a large number of requests - the keys were pretty trivial, as you can see.
The best practice is to use a normalized database schema. Then the DBMS keeps it up to date, so you don't have to.
But I understand the tradeoff that makes a denormalized design attractive. In that case, the best practice is to update the total on every change. Investigate triggers. The advantage of this practice is that you can make the total keep in sync with the changes so you never have to think about whether it's out of date or not. If one change is committed, then the updated total is committed too.
However, this has some weaknesses with respect to concurrent changes. If you need to accommodate concurrent changes to the same totals, and you can tolerate the totals being "eventually consistent," then use periodic recalculation of the total, so you can be sure only one process at a time is changing the total.
Another good practice is to cache aggregate totals outside the database, e.g. memcached or in application variables, so you don't have to hit the database every time you need to display the value.
The query "select sum(points) as total from points where id = ?
" should not take 2 seconds, even if you have a huge number of rows and a lot of requests.
If you have a covering index defined over (id, points)
then the query can produce the result without reading data from the table at all; it can calculate the total by reading values from the index itself. Use EXPLAIN to analyze your query and look for the "Using index" note in the Extra column.
CREATE TABLE Points (
id INT,
points INT,
reason VARCHAR(10),
KEY id (id,points)
);
EXPLAIN SELECT SUM(points) AS total FROM Points WHERE id = 1;
+----+-------------+--------+------+---------------+------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+------+---------+-------+------+--------------------------+
| 1 | SIMPLE | points | ref | id | id | 5 | const | 9 | Using where; Using index |
+----+-------------+--------+------+---------------+------+---------+-------+------+--------------------------+
By all means keep the underlying table normalized. If you can deal with data potentially being one day old, run a script each nigh (you can schedule it), to do the roll up and populate the new table. Best to just re-create the thing each night from the source table to prevent any inconsistencies between the two.
That said, with the size of your record, you must either have very slow server, or very large # of records, because a record that small, with an indexed field on id should sum very quickly for you - however, I am of the mindset that if you can improve user response time by even a few seconds, there is no reason not to use rollup tables - even if DB purists object.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With