I have a table called transactions
with ~20 million records. This table grows every second.
I calculate users current balance with:
SELECT sum(`amount`) FROM `transactions` WHERE `user_id` = 1000;
I'm showing user current balance in top bar of my web application and user can see how much balance he/she has!
Obviously every time a user browses my web app pages, the above query must be executed to calculate the current user balance!
I want to create a Summary Table to obtain the current user balance without querying on that transactions
table with ~20 million records!
Be aware that in our workflow it's so common that a user may have multiple transactions simultaneously (a user may even have multiple transactions in just one second).
I think we have two approaches here:
The First Approach
Creating a Summary Table with One-To-One relationship as below:
ID | user_id | current_balance
1 | 1000 | 8590
2 | 1001 | 235
3 | 1002 | 3780
... | ... | ...
And every time a new record inserted into the transactions
table we trigger a stored procedure to update the user current_balance
in the Summary Table.
I don't know if this approach breaks MySQL consistency or not!
The Second Approach
Creating a Summary Table with One-To-Many relationship as below:
ID | user_id | amount
1 | 1000 | 8590 <--- it's the initial user balance
2 | 1001 | 235 <--- it's the initial user balance
3 | 1002 | 3780 <--- it's the initial user balance
4 | 1000 | 50
5 | 1000 | -30
6 | 1001 | 10
7 | 1002 | 60
8 | 1000 | -45
We clear out our Summary Table nightly (for example at 00:00 AM
) and recalculate the current balance for all users from transactions
table and insert them into the Summary Table. To determine a user's current balance we just need to do this:
SELECT sum(`amount`) FROM `users_balance` WHERE `user_id` = 1000;
But there is something that worries me about this approach. What if some users do transactions exactly at the time we are recalculating the users current balance and putting them into the Summary Table! ( exactly at 00:00 AM
)
Does this approach break consistency?
Please tell me if you know any better practice for this workflow.
P.S.
Our web app is an SMS panel via which users can send/receive/etc. SMS through the panel directly or an API. We have some users who send 1 million or more SMS in a day!
Every time an SMS is sent, a new record must be inserted in the transactions
table.
I know 20 million records is not a big deal and we can achieve good performance with indexes, but as I mentioned above it's an ever-growing table. I'm pretty sure next year we'll have hundreds of millions records in transactions
table.
It should encourage use of the existing summary tables, not not be truly 'open ended'. Later, another 'requirement' may surface. So, build another Summary Table. Of course, it may take a day to initially populate it. Does one ever need to summarize a summary table? Yes, but only in extreme situations.
The easiest way to create summary tables in R is to use the describe () and describeBy () functions from the psych library. library(psych) #create summary table describe (df) #create summary table, grouped by a specific variable describeBy (df, group=df$var_name) The following examples show how to use these functions in practice.
A summary table includes two sets of columns: ⚈ Main KEY: date + some dimension(s) ⚈ Subtotals: COUNT(*), SUM(...), ...; but not AVG() Usually the "date" might be a DATE (a 3-byte native datatype), but it could an hour, or some other time interval.
It is used worldwide because of its consistently fast performance, high reliability, and ease of use. This article presents some of the MySQL best practices. With them, you can practice SQL and set yourself some SQL exercises, learn about SQL limit and SQL practice online, as well as about the most common SQL practice problems.
You're maintaining a balance for each user, as you have explained.
Your best bet is to write application code that carries out two queries, perhaps in a transaction, but probably not.
One query:
UPDATE balances
SET current_balance = current_balance - 1
WHERE user_id = 1000
That query, in itself, maintains consistency without any need for a transaction.
(Edit) It looks for the row of the balances
table with user_id=1000
and then subtracts one from the value of current_balance
in that row, reading, modifying, then writing the row. You can do this kind of arithmetic with column values in INSERT
and UPDATE
queries as needed.
The other query
INSERT INTO transactions (columns) VALUES (values)
The way you have explained your application, it sounds like the integrity of your business depends upon the table I'm calling balances
in my first query. The transactions
table is a log of user activity, and serves to explain how a customer balance got to be what it is. So, if you get your application to perform the two queries I propose in order, you will have excellent balances values and good-enough logging. That's a good way to structure a transactional database.
Why should your balances
be maintained separately from your transaction log? What if you want to give a customer 100 free messages? What if you want to start charging extra for messages at a certain time of day? What if a customer demands a credit for a batch of messages that were, according to her, handled incorrectly? If you make your balances from your transactions table, you're going to have to put all sorts of bizarre stuff into that table to handle your evolving business rules.
Would I bury the update of the balances
table in a trigger if I were you? No, I would not. I'd make it part of your application. Easier to see, easier to debug, etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With