Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SUM of history table in database to show user's total credit (reputation)

Tags:

php

mysql

Explanation:

  • I have an script which shows total credits(reputation) for each user and it has a history table in database for earned credits of all uesrs

Here is a sample of my History Database Table:

 +----------------------------------------------+
 | DATE     ID     USERNAME       CREDITS       |
 +----------------------------------------------+
 | ...      1         X              12         |
 | ...      2         E               2         |
 | ...      3         X               1         |
 | ...      4         X              -7         |
 | ...      5         O               4         |
 +----------------------------------------------+
  • My Script uses SELECT SUM FROM table WHERE username='X' and echoes it, so in this case for user X (12 + 1 - 7) it echoes 6

Questions:

  1. I wanted to know isn't this (SELECT SUM of all history to show a users credit INSTEAD of having a different table for total credids of users) going to make problems if the history table is so huge? (lets say +100,000,000 records after a few years)

  2. Is this what most of professional programmers do? (if not, what is)

  3. What about the history section, if users want to review credits history should we LIMIT it with like LIMIT 100 records when *SELECT*ing or no (for performance)

  4. Is this supposed to run on each page refresh or each page change? (if 1000 users are online and this SELECT query is applied on each refresh doesn't it slow down the server)

EDIT After an answer:

But if we have to keep totals in a different table and update them automatically there are two problems:

  1. If we do it exactly when user have received some credits isn't it possible user have received two different credits at exactly the same time (it is possible), and as we can't put Auto Increment in Totals table (because each user only have 1 record) we might miss 1 credit, Or if there is a solution for this problem, i am not aware of that

  2. If we set a Cron-Job to do it frequently, then the user credits are not up to date until cron job refreshes the totals table

like image 227
Vladimir Avatar asked Jun 30 '13 16:06

Vladimir


7 Answers

If we do it exactly when user have received some credits that is possible user have received two different credits at exactly the same time (very possible), and as we can't put Auto Increment in Totals table (because each user only have 1 record) we might miss 1 credit and don't add it to totals table, Or if there is a solution for this problem, i am not aware of that, i only now we should use AI in these situations

We won't miss that. Check the following SQL statements:

INSERT INTO history SET username = 'X', credits = 2;
UPDATE users SET credits_sum = (SELECT SUM(credits) FROM `history` WHERE username = 'X') WHERE username = 'X';

Even if there is a situation where two events that add credits are fired, our credits_sum would be up to date since it's updated from the data stored in the database (not in the application - in that scenario there could be some differences sometimes).

Of course the primary key from users table should be used instead of username = 'X'.

like image 167
Marcin Krawiec Avatar answered Oct 06 '22 01:10

Marcin Krawiec


To make this scaleable as the number of entries in your database grows, you could consider the following:

Create two tables: one, "historical totals", contains the totals for each user up to 00:00:00 am today; the second can be a (relatively) small table of "today's credits".

When you need current status, you add the lookup from "historical table" to "the new credits" (small table, thus fast). At midnight you add all the day's credits to the totals, then (after a delay) delete the corresponding elements from the "today" table. You need the delay so there is no situation where elements were being deleted from the "current" table as you are querying it. To ensure you always get the right answer you must mark the "historical" data with a "calculated up to date/time" field; and after you have thus updated the totals, you then delete "all information up to this time" from the "current" database. If you first check the totals database for the total & time stamp, then compute the "sum since" from the current database, there ought to be no possibility of error. That's the reason for the delay between updating the totals and deleting items from the current database.

like image 28
Floris Avatar answered Oct 05 '22 23:10

Floris


  1. Yes, it will. I would recommend keeping (sub)totals in a different table and letting a stored procedure update them automatically.
  2. With large scale you have to start denormalizing, so keep a sum so you don't have to recalculate it constantly.
  3. Pagination is a good idea for both performance and usability, seeing thousands of rows does not help readability. I would suggest however, to filter by range (i.e. id BETWEEN x AND y instead of LIMIT 100 OFFSET 500
  4. Yes, it will. If there is anything that doesn't change too often. Cache it. For example... in Redis or Memcached.
like image 38
Wolph Avatar answered Oct 05 '22 23:10

Wolph


I would suggest a separate table to keep track of the total credits for each users, and then use a trigger to keep that table up to date.

Assuming the table to track the total credits looks something like this:

CREATE TABLE reputation (
  username varchar(20) primary key,
  total int
)

then the trigger would look like this:

CREATE TRIGGER historyInsert AFTER INSERT ON history
FOR EACH ROW BEGIN
  INSERT INTO reputation (username,total)
  VALUES (NEW.username,NEW.credits)
  ON DUPLICATE KEY UPDATE total = total + NEW.credits;
END

When anything is inserted into your history table, it fires off this trigger. For each inserted row, the trigger either inserts a new value for into the reputation table, or updates the total value if the user already exists.

Note that INSERT ... ON DUPLICATE KEY UPDATE is an atomic operation in MySQL so you don't have to worry about two updates happening at the same time.

SQL Fiddle demo

As an alternative to creating a separate reputation table, if you already have a users table of some form, you could always store the total credits for each user there. Assumedly there would already be an entry for each user, so the trigger wouldn't have to worry about creating new entries - it would just be updating them.

The trigger code then becomes even simpler:

CREATE TRIGGER historyInsert AFTER INSERT ON history
FOR EACH ROW BEGIN
  UPDATE users SET total = total + NEW.credits
  WHERE username = NEW.username
END

Again, this UPDATE query is atomic. It's just incrementing the total field, so if two updates happen at the same time, they won't overwrite each other - both amounts will be added to the total.

And this is way more efficient than having to calculate a SUM over the entire history every time a new value is inserted.

like image 24
James Holderness Avatar answered Oct 06 '22 00:10

James Holderness


  1. Like the others here, I'd advocate splitting into a "live" and "historical" table for user credits. You can have a nightly (or weekly, or whatever) job migrate the records from live to historical. If you can keep the "live" table compact enough that it (and it's supporting indexes) are largely in memory, performance shouldn't be a problem. You might want to add a third "total credits" table at the end of whatever job you use to maintain the historical table: that way, looking at credit totals (excluding today's) is a single indexed read.

  2. Presumably, the credits are immutable once added. So there is little point in forcing your program to re-add them over, and over and over, if they don't change. If you don't need transactional detail for historical credits, keep them summed by month.

  3. The limit would help some, but highlights a design deficiency: don't store records you won't reference: they continue to use disk space, index space, and memory. You have to be fairly rational (and cold-blooded) about what you actually need. Look at your business model: why do you want users to be able to review their credits history? And will you alienate them if you cut off what they can view at some arbitrary limit? You have to be able to figure out the policy yourself, because you know your business and your users. But make the technology the servant of the policy, not the other way around.

  4. These questions go to overall architecture: there are certainly ways of caching query results during the course of a web session, if those queries are expensive. It depends on your overall architecture and the technology stack you are using.

--- Second set of questions

  1. Move credits into history at day boundaries. Even on the "live" table, use the current date as part of your select criteria. That way, you'll never drop (or double-count) credits inadvertantly.

  2. Not sure I understand. Credits will be insterted into the "live" table at the precise moment they are earned, and then copied into the historical table at the day boundary. The "live" table will always be up to date for that day, and the historical table will always be up to date for the stuff older than one day.

I hope your project goes well...

like image 29
Curt Avatar answered Oct 06 '22 01:10

Curt


What I would say to do is to keep track of your historical data as you are now but also cache the final result in either a credits table or in a property of the user table.

In pseudo-code:

 function postCreditTransaction($username, integer $credit){
      $db->insert("credit_history", array("USERNAME"=>$username, "CREDIT"=>$credit));
      $db->update("update user_table set credit = credit + $credit where username = ".$db->quote($username));
 }

This will give you the detail provided by the credit history but low fat access to the total.

To make sure everything is in step you can perform a periodic audit of the credit_history table against the cached value in the cache field.

like image 20
Orangepill Avatar answered Oct 06 '22 01:10

Orangepill


Ok, lets start from short resume:

  1. Yes, you need to store pre-calculated reputation for performance purposes.
  2. If there is table with users' info - add field "reputation_sum" (there is no sense to separate this data), if not - make special table.
  3. When reputation changes you will know difference, add that difference to the "reputation_sum".

Here i mean - don’t use "SELECT SUM of all history..." to calculate new value of "reputation_sum". When you add/update/delete record from the "history" table calculate total_reputation_change_value and update "reputation_sum" without recalculating sum on all records of the "history" table. total_reputation_change_value for INSERT operation will be - value of the "credits" field; same for DELETE, but with unary minus; difference between old and new values for UPDATE. This will give significant more requests/s if reputation is changing often. This also will violate data integrity a bit more. If you afraid of this - make special cron job, which will refresh "reputation_sum" data by summing records from history periodically. But in most cases (with correct defined work-flow) there is no need to do this.

Also I advice you not use USERNAME as a foreign key (if you have "users" table and this is foreign key). Better make integer USERID. It will search faster in history table.

Now let me answer you questions.

I wanted to know isn't this (SELECT SUM of all history to show a users credit INSTEAD of having a different table for total credids of users) going to make problems if the history table is so huge? (lets say +100,000,000 records after a few years)

Yeah, if calculate reputation every time from table, which has "lets say +100,000,000 records after a few years", it will be really inefficient, because of amount of calculations. Maybe there will be no lags, if you have enough servers, but i'm sure they will)

Is this what most of professional programmers do? (if not, what is).

This is common solution, and it works fine in most cases. Maybe it is not optimal one for you, but we don't have enough information to advice better. In this kind of situation professional programmers can use bunch of methods, depends on specifics of a project.

Also good solution for problems like this is caching data. But it serves for a bit different needs. You should be sure that users make complex but identical requests and data is changed not very often.

If data is changed not very often other good optimization trick - making index.

What about the history section, if users want to review credits history should we LIMIT it with like LIMIT 100 records when *SELECT*ing or no (for performance)

Of course you should. In most situations users cannot see all of 100 (200, 300) items same time. Also they will look for ALL records (as i understand, they'll have alot records in this section) not every time. Even if user will see all records, that will anyway take some time in seconds or maybe minutes. Using limitations for single request will distribute the load over time and decrease load peaks. This will increase average performance for users.

So for performance benefits you should provide partial load functionality for big amounts of content.

Is this supposed to run on each page refresh or each page change? (if 1000 users are online and this SELECT query is applied on each refresh doesn't it slow down the server)

Any activity of users will slow down your server, this is something impossible to fix:) But here we talk about efficient of using different methods, for obtaining needed functionality. As for me, i dont know what "if 1000 users are online and this SELECT query is applied on each refresh" means. Is this a forum where you can see alot of user records with reputation? Or maybe it is profile page with only one reputation? Or maybe you want to see reputation of that 1000 online users, without offline?

If we do it exactly when user have received some credits isn't it possible user have received two different credits at exactly the same time (it is possible), and as we can't put Auto Increment in Totals table (because each user only have 1 record) we might miss 1 credit, Or if there is a solution for this problem, i am not aware of that

You shouldn't care about transactional integrity, because it is DBMS problem. You should only bring changes to "reputation_sum" field every time reputation changed. I mean - just do SQL request.

If we set a Cron-Job to do it frequently, then the user credits are not up to date until cron job refreshes the totals table

Don't use cron. Or use just for data actualization, if you want.

like image 32
QArea Avatar answered Oct 05 '22 23:10

QArea