I have a web server on which I've implemented my own messaging system. I am at a phase where I need to create an API that checks if the user has new messages.
My DB table is simple:
ID - Auto Increment, Primary Key (Bigint)
Sender - Varchar (32) // Foreign Key to UserID hash from Users DB Table
Recipient - Varchar (32) // Foreign Key to UserID hash from Users DB Table
Message - Varchar (256) //UTF8 BIN
I am considering making an API that will estimate if there are new messages for a given user. I am thinking of using one of these methods:
A) Select count(ID)
of messages where sender or recipient is me.
(if this number > previous number, I have a new message)
B) Select max(ID)
of messages where sender or recipient is me.
(if max(ID) > than previous number, I have a new message)
My question is: Can I calculate somehow what method will consume fewer server resources? Or is there some article? Maybe another method I didn't mention?
In this case, COUNT(id) counts the number of rows in which id is not NULL .
The simple answer is no – there is no difference at all. The COUNT(*) function counts the total rows in the table, including the NULL values.
Your use of COUNT(*) or COUNT(column) should be based on the desired output only. ... if you have a non-nullable column such as ID, then count(ID) will significantly improve performance over count(*).
In MySQL InnoDB, SELECT COUNT(id) WHERE secondary_index = ?
is an expensive operation and when the user has a lot of messages, this query might take a long time. Even when using an index, the engine still needs to count all matching records. The performance will degrade with growing total message count.
On the other hand, SELECT MAX(id) WHERE secondary_index = ?
can deliver the highest id in that index in almost constant time by doing a simple drilldown in the B-Tree structure of the index.
If you want to understand why, consider looking up how the B+Tree data structure works, which is used by InnoDB to structure the rows of your tables and indexes.
I suggest you go with SELECT MAX(id)
, if the requirement is only to check if there are new messages (and not the count of them).
Also, if you rely on the message count you might open a gap for race conditions. What if the user deletes a message and receives a new one between two polling intervals?
To have the information that someone has new messages - do exactly that. Update the field in users
table (I'm assuming that's the name) when a new message is recorded in the system. You have the recipient's ID, that's all you need. You can create an after insert
trigger (assumption: there's users2messages
table) that updates users table with a boolean flag indicating there's a message.
This approach is by far faster than counting indexes, be the index primary or secondary. When the user performs an action, you can update the users
table with has_messages = 0
, when a new message arrives - you update the table with has_messages = 1
. It's simple, it works, it scales and using triggers to maintain it makes it easy and seamless.
I'm sure there will be nay-sayers who don't like triggers, you can do it manually at the point of associating a user with a new message.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With