I have a table called <code>transactions</code> with ~20 million records. This table grows every second. I calculate users current balance with: <pre class="prettyprint"><code>SELECT sum(`amount`) FROM `transactions` WHERE `user_id` = 1000; </code></pre> I'm showing user current balance in top bar of my web application and user can see how much balance he/she has! Obviously every time a user browses my web app pages, the above query must be executed to calculate the current user balance! I want to create a Summary Table to obtain the current user balance without querying on that <code>transactions</code> table with ~20 million records! Be aware that in our workflow it's so common that a user may have multiple transactions simultaneously (a user may even have multiple transactions in just one second). I think we have two approaches here: The First Approach Creating a Summary Table with One-To-One relationship as below: <pre class="prettyprint"><code>ID | user_id | current_balance 1 | 1000 | 8590 2 | 1001 | 235 3 | 1002 | 3780 ... | ... | ... </code></pre> And every time a new record inserted into the <code>transactions</code> table we trigger a stored procedure to update the user <code>current_balance</code> in the Summary Table. I don't know if this approach breaks MySQL consistency or not! The Second Approach Creating a Summary Table with One-To-Many relationship as below: <pre class="prettyprint"><code>ID | user_id | amount 1 | 1000 | 8590 <--- it's the initial user balance 2 | 1001 | 235 <--- it's the initial user balance 3 | 1002 | 3780 <--- it's the initial user balance 4 | 1000 | 50 5 | 1000 | -30 6 | 1001 | 10 7 | 1002 | 60 8 | 1000 | -45 </code></pre> We clear out our Summary Table nightly (for example at <code>00:00 AM</code>) and recalculate the current balance for all users from <code>transactions</code> table and insert them into the Summary Table. To determine a user's current balance we just need to do this: <pre class="prettyprint"><code>SELECT sum(`amount`) FROM `users_balance` WHERE `user_id` = 1000; </code></pre> But there is something that worries me about this approach. What if some users do transactions exactly at the time we are recalculating the users current balance and putting them into the Summary Table! ( exactly at <code>00:00 AM</code>) Does this approach break consistency? <hr> Please tell me if you know any better practice for this workflow. P.S. Our web app is an SMS panel via which users can send/receive/etc. SMS through the panel directly or an API. We have some users who send 1 million or more SMS in a day! Every time an SMS is sent, a new record must be inserted in the <code>transactions</code> table. I know 20 million records is not a big deal and we can achieve good performance with indexes, but as I mentioned above it's an ever-growing table. I'm pretty sure next year we'll have hundreds of millions records in <code>transactions</code> table.

You're maintaining a balance for each user, as you have explained. Your best bet is to write application code that carries out two queries, perhaps in a transaction, but probably not. One query: <pre class="prettyprint"><code> UPDATE balances SET current_balance = current_balance - 1 WHERE user_id = 1000 </code></pre> That query, in itself, maintains consistency without any need for a transaction. (Edit) It looks for the row of the <code>balances</code> table with <code>user_id=1000</code> and then subtracts one from the value of <code>current_balance</code> in that row, reading, modifying, then writing the row. You can do this kind of arithmetic with column values in <code>INSERT</code> and <code>UPDATE</code> queries as needed. The other query <pre class="prettyprint"><code> INSERT INTO transactions (columns) VALUES (values) </code></pre> The way you have explained your application, it sounds like the integrity of your business depends upon the table I'm calling <code>balances</code> in my first query. The <code>transactions</code> table is a log of user activity, and serves to explain how a customer balance got to be what it is. So, if you get your application to perform the two queries I propose in order, you will have excellent balances values and good-enough logging. That's a good way to structure a transactional database. Why should your <code>balances</code> be maintained separately from your transaction log? What if you want to give a customer 100 free messages? What if you want to start charging extra for messages at a certain time of day? What if a customer demands a credit for a batch of messages that were, according to her, handled incorrectly? If you make your balances from your transactions table, you're going to have to put all sorts of bizarre stuff into that table to handle your evolving business rules. Would I bury the update of the <code>balances</code> table in a trigger if I were you? No, I would not. I'd make it part of your application. Easier to see, easier to debug, etc.

Best practice to create a Summary Table for an ever-growing table in MySQL

Tags:

sql

database

mysql

database-design

I have a table called transactions with ~20 million records. This table grows every second.

I calculate users current balance with:

SELECT sum(`amount`) FROM `transactions` WHERE `user_id` = 1000;

I'm showing user current balance in top bar of my web application and user can see how much balance he/she has!

Obviously every time a user browses my web app pages, the above query must be executed to calculate the current user balance!

I want to create a Summary Table to obtain the current user balance without querying on that transactions table with ~20 million records!

Be aware that in our workflow it's so common that a user may have multiple transactions simultaneously (a user may even have multiple transactions in just one second).

I think we have two approaches here:

The First Approach

Creating a Summary Table with One-To-One relationship as below:

ID  |  user_id  |  current_balance
1   |  1000     |      8590
2   |  1001     |      235
3   |  1002     |      3780
... |  ...      |      ...

And every time a new record inserted into the transactions table we trigger a stored procedure to update the user current_balance in the Summary Table.

I don't know if this approach breaks MySQL consistency or not!

The Second Approach

Creating a Summary Table with One-To-Many relationship as below:

ID  |  user_id  |  amount
1   |  1000     |   8590    <--- it's the initial user balance
2   |  1001     |   235     <--- it's the initial user balance
3   |  1002     |   3780    <--- it's the initial user balance
4   |  1000     |   50
5   |  1000     |   -30
6   |  1001     |   10
7   |  1002     |   60
8   |  1000     |   -45

We clear out our Summary Table nightly (for example at 00:00 AM) and recalculate the current balance for all users from transactions table and insert them into the Summary Table. To determine a user's current balance we just need to do this:

SELECT sum(`amount`) FROM `users_balance` WHERE `user_id` = 1000;

But there is something that worries me about this approach. What if some users do transactions exactly at the time we are recalculating the users current balance and putting them into the Summary Table! ( exactly at 00:00 AM)

Does this approach break consistency?

Please tell me if you know any better practice for this workflow.

P.S.

Our web app is an SMS panel via which users can send/receive/etc. SMS through the panel directly or an API. We have some users who send 1 million or more SMS in a day!

Every time an SMS is sent, a new record must be inserted in the transactions table.

I know 20 million records is not a big deal and we can achieve good performance with indexes, but as I mentioned above it's an ever-growing table. I'm pretty sure next year we'll have hundreds of millions records in transactions table.

551

asked May 08 '17 11:05

Hamed Kamrava

1 Answers

You're maintaining a balance for each user, as you have explained.

Your best bet is to write application code that carries out two queries, perhaps in a transaction, but probably not.

One query:

      UPDATE balances
         SET current_balance = current_balance - 1 
       WHERE user_id = 1000

That query, in itself, maintains consistency without any need for a transaction.

(Edit) It looks for the row of the balances table with user_id=1000 and then subtracts one from the value of current_balance in that row, reading, modifying, then writing the row. You can do this kind of arithmetic with column values in INSERT and UPDATE queries as needed.

The other query

      INSERT INTO transactions (columns) VALUES (values)

The way you have explained your application, it sounds like the integrity of your business depends upon the table I'm calling balances in my first query. The transactions table is a log of user activity, and serves to explain how a customer balance got to be what it is. So, if you get your application to perform the two queries I propose in order, you will have excellent balances values and good-enough logging. That's a good way to structure a transactional database.

Why should your balances be maintained separately from your transaction log? What if you want to give a customer 100 free messages? What if you want to start charging extra for messages at a certain time of day? What if a customer demands a credit for a batch of messages that were, according to her, handled incorrectly? If you make your balances from your transactions table, you're going to have to put all sorts of bizarre stuff into that table to handle your evolving business rules.

Would I bury the update of the balances table in a trigger if I were you? No, I would not. I'd make it part of your application. Easier to see, easier to debug, etc.

162

answered Oct 15 '22 09:10

O. Jones

Related questions
                            
                                Best way to select values that start with symbols
                            
                                In wp_query, how can I order by a complex calculated or conditional fields?
                            
                                Import CSV In To Temporary Table In MySQL
                            
                                Python mysql-connector converts some strings into bytearray
                            
                                MySQL connector python 35 Resource temporarily unavailable with large queries?
                            
                                How do I combine these two queries to calculate rank change?
                            
                                Use a postgres database with symfony3
                            
                                How to query comma separated string?
                            
                                PHP, MySQL - Get multiple, identifiable, rows returned from a single JOIN?
                            
                                Adding load balancer to a wordpress multisite installation
                            
                                What, exactly, does allowMultiQueries do?
                            
                                AJAX run onchange event before the page load
                            
                                Laravel Scout - observe relations
                            
                                All data is not inserted in MySql instance using Entity Framework Core
                            
                                SQLAlchemy error when adding parameter to string SQL query
                            
                                In what context is the MySQL keyword 'one' used?
                            
                                Persisting MySQL data in Docker
                            
                                MySQL : group by column and get date range
                            
                                Proper connection string to pass to sqlalchemy create_engine() for mysql AWS RDS

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With