I am currently planning a new system in PHP/MySQL and want to make sure my database can handle the amount of data that I am planning to store. One of the features of my new project is a "messages" feature like Facebook. I want to make sure I create the best possible experience for the end user. The website will eventually handle 1000's of users with potentially millions of messages collectively. What would be the best approach for the database design? Is MySQL even the right database to use?
MySQL is the primary database used by Facebook for storing all the social data. They started with the InnoDB MySQL database engine and then wrote MyRocksDB, which was eventually used as the MySQL Database engine.
Given the data modeling flexibility of SQL and the ubiquity of MySQL when FB started, this social graph was initially built as a PHP application powered by MySQL as the persistent database and memcache as a “lookaside” cache.
Well, we have an open-source application, which helps us to manage this enormous amount of data, and it's under Apache License, namely known as Apache Hadoop.
Database design is the organization of data according to a database model. The designer determines what data must be stored and how the data elements interrelate. With this information, they can begin to fit the data to the database model. Database management system manages the data accordingly.
MySQL has no problem with millions or hundreds of millions of records as long as you design you database correctly.
That being said, a "messages feature like Facebook" is a pretty broad definition. Generally, you would define a messages
table that links each message to the user that created it (ie, have a userId
column in the messages table). If you want messages to go to multiple users, you have a message_recipients
table defining the 1-to-many relationship by storing multiple records consisting of the messageId
and a recipientId
. Add the proper indexes to these tables and you're 80% of the way there.
That being said, that remaining 20% can be a killer. Unfortunately, how you use your database is going to determine what else you need to do, and you'd have to provide a lot more detail about your application before those judgments can be made. For example, you might wish to consider having auto-archiving solution which keeps the main table relatively small, and moves old data to backup tables that can be accessed if necessary. You probably won't need this right away, but it could help in the future.
Facebook started with MySQL and they only moved to Cassandra when they had 7TB of inbox data for over 100 million users.
Source: Lakshman, Malik: Cassandra - A Decentralized Structured Storage System.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With