Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best practice to create a Summary Table for an ever-growing table in MySQL

I have a table called transactions with ~20 million records. This table grows every second.

I calculate users current balance with:

SELECT sum(`amount`) FROM `transactions` WHERE `user_id` = 1000;

I'm showing user current balance in top bar of my web application and user can see how much balance he/she has!

Obviously every time a user browses my web app pages, the above query must be executed to calculate the current user balance!

I want to create a Summary Table to obtain the current user balance without querying on that transactions table with ~20 million records!

Be aware that in our workflow it's so common that a user may have multiple transactions simultaneously (a user may even have multiple transactions in just one second).

I think we have two approaches here:

The First Approach

Creating a Summary Table with One-To-One relationship as below:

ID  |  user_id  |  current_balance
1   |  1000     |      8590
2   |  1001     |      235
3   |  1002     |      3780
... |  ...      |      ...

And every time a new record inserted into the transactions table we trigger a stored procedure to update the user current_balance in the Summary Table.

I don't know if this approach breaks MySQL consistency or not!

The Second Approach

Creating a Summary Table with One-To-Many relationship as below:

ID  |  user_id  |  amount
1   |  1000     |   8590    <--- it's the initial user balance
2   |  1001     |   235     <--- it's the initial user balance
3   |  1002     |   3780    <--- it's the initial user balance
4   |  1000     |   50
5   |  1000     |   -30
6   |  1001     |   10
7   |  1002     |   60
8   |  1000     |   -45

We clear out our Summary Table nightly (for example at 00:00 AM) and recalculate the current balance for all users from transactions table and insert them into the Summary Table. To determine a user's current balance we just need to do this:

SELECT sum(`amount`) FROM `users_balance` WHERE `user_id` = 1000;

But there is something that worries me about this approach. What if some users do transactions exactly at the time we are recalculating the users current balance and putting them into the Summary Table! ( exactly at 00:00 AM)

Does this approach break consistency?


Please tell me if you know any better practice for this workflow.

P.S.

Our web app is an SMS panel via which users can send/receive/etc. SMS through the panel directly or an API. We have some users who send 1 million or more SMS in a day!

Every time an SMS is sent, a new record must be inserted in the transactions table.

I know 20 million records is not a big deal and we can achieve good performance with indexes, but as I mentioned above it's an ever-growing table. I'm pretty sure next year we'll have hundreds of millions records in transactions table.

like image 551
Hamed Kamrava Avatar asked May 08 '17 11:05

Hamed Kamrava


People also ask

Do I need to build a new summary table?

It should encourage use of the existing summary tables, not not be truly 'open ended'. Later, another 'requirement' may surface. So, build another Summary Table. Of course, it may take a day to initially populate it. Does one ever need to summarize a summary table? Yes, but only in extreme situations.

How do I create a summary table in R?

The easiest way to create summary tables in R is to use the describe () and describeBy () functions from the psych library. library(psych) #create summary table describe (df) #create summary table, grouped by a specific variable describeBy (df, group=df$var_name) The following examples show how to use these functions in practice.

What are the columns in a summary table?

A summary table includes two sets of columns: ⚈ Main KEY: date + some dimension(s) ⚈ Subtotals: COUNT(*), SUM(...), ...; but not AVG() Usually the "date" might be a DATE (a 3-byte native datatype), but it could an hour, or some other time interval.

What are the best practices of MySQL?

It is used worldwide because of its consistently fast performance, high reliability, and ease of use. This article presents some of the MySQL best practices. With them, you can practice SQL and set yourself some SQL exercises, learn about SQL limit and SQL practice online, as well as about the most common SQL practice problems.


1 Answers

You're maintaining a balance for each user, as you have explained.

Your best bet is to write application code that carries out two queries, perhaps in a transaction, but probably not.

One query:

      UPDATE balances
         SET current_balance = current_balance - 1 
       WHERE user_id = 1000

That query, in itself, maintains consistency without any need for a transaction.

(Edit) It looks for the row of the balances table with user_id=1000 and then subtracts one from the value of current_balance in that row, reading, modifying, then writing the row. You can do this kind of arithmetic with column values in INSERT and UPDATE queries as needed.

The other query

      INSERT INTO transactions (columns) VALUES (values)

The way you have explained your application, it sounds like the integrity of your business depends upon the table I'm calling balances in my first query. The transactions table is a log of user activity, and serves to explain how a customer balance got to be what it is. So, if you get your application to perform the two queries I propose in order, you will have excellent balances values and good-enough logging. That's a good way to structure a transactional database.

Why should your balances be maintained separately from your transaction log? What if you want to give a customer 100 free messages? What if you want to start charging extra for messages at a certain time of day? What if a customer demands a credit for a batch of messages that were, according to her, handled incorrectly? If you make your balances from your transactions table, you're going to have to put all sorts of bizarre stuff into that table to handle your evolving business rules.

Would I bury the update of the balances table in a trigger if I were you? No, I would not. I'd make it part of your application. Easier to see, easier to debug, etc.

like image 162
O. Jones Avatar answered Oct 15 '22 09:10

O. Jones