Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Private messaging system, large single table versus many small tables

I'm considering a design for a private messaging system and I need some input here, basically I have several questions regarding this. I've read most of the related questions and they've given me some thought already.

All of the basic messaging systems I've thus far looked into use a single table for all of the users' messages. With indexes etc this approach would seem fine.

What I wanted to know is if there would be any benefit to splitting the user messages into separate tables. So when a new user is created a new table is created (either in the same or a dedicated message database) which stores all of the messages - sent and received -for that user.

What are the pitfalls/benefits to approaching things that way? I'm writing in PHP would the code required to write be particularly more cumbersome than the first large table option? Would the eventual result, with a large amount of smaller tables be a more robust, trouble free design than one large table? In the event of large amounts of concurrent users, how would the performance of the server compare where dealing with one large versus many small tables?

Any help with those questions or other input would be appreciated. I'm currently working through a smaller scale design for my test site before rewriting the PM module and would like to optimise it. My poor human brain handles separate table far more easily, but the same isn't necessarily so for a computer.

like image 507
TooManyCooks Avatar asked Mar 01 '10 22:03

TooManyCooks


2 Answers

You'll just get headaches from moving to small numerous tables. Databases are made for handling lots of data, let it do it's thing.

  • You'll likely end up using dynamic table names in queries (SELECT * FROM $username WHERE ...), making smart features like stored procedures and possibly parameterized queries a lot trickier if not outright impossible. Usually a really bad idea.

  • Try rewriting SELECT * FROM messages WHERE authorID = 1 ORDER BY date_posted DESC, but where "messages" is anywhere between 1 and 30,000 different tables. Keeping your table relations monogamous will keep them bidirectional, way more useful.

If you think table size will really be a problem, set up an "archived messages" clone table and periodically move old & not-unread messages there where they won't get in the way. Also note how most forum software with private messaging allows for limiting user inbox sizes. There are a few ways to solve the problem while keeping things sane.

like image 58
tadamson Avatar answered Sep 18 '22 12:09

tadamson


I'm agreeing with @MarkR here - in that initially the one table for messages is definitely the way to proceed. As time progresses and should you end up with a very large table then you can consider how to partition the table to best proceed. That's counter to the way I'd normally advise design, but we're talking about one table which is fairly simple - not a huge enterprise system.

A very long time ago (pre availability of SQL databases) I built a system that stored private and public messages, and I can confirm that once you split a message base logical entity into more than one everything¹ becomes a lot more complicated; and I doubt that a user per file is the right approach - the overheads will be massive compared to the benefit.

Avoid auto-increment[2] - and using natural keys is very important to the future scalability. Designing well to ensure that you can insert and retrieve without locking will be of more benefit.


¹ Indexing, threading, searching, purging/archiving.

² Natural keys are better if you can find one for your data as the autoincremented ID does not describe the data at all and databases are good at locating based on the primary key, so a natural primary key can improve things. Autoincrement can cause problems with a distributed database; it also leaks data when presented externally (to see the number of users registered just create a new account and check your user ID). If you can't find a natural key then a UUID (or GUID) may still be a better option - providing that the database has good support for this as a primary key. See When to use an auto-incremented primary key and when not to

like image 42
Richard Harrison Avatar answered Sep 19 '22 12:09

Richard Harrison