Building a system that has the potential to get hammered pretty hard with hits and traffic. It's a typical Apache/PHP/MySql setup.
Have build plenty of systems before, but never had a scenario where I really had to make decisions regarding potential scalability of this size. I have dozens of questions regarding building a system of this magniture, but for this particular question, I am trying to decide on what to use as the data type.
Here is the 100ft view:
We have a table which (among other things) has a description field. We have decided to limit it to 255 characters. It will be searchable (ie: show me all entries with description that contains ...). Problem: this table is likely to have millions upon millions of entries at some point (or so we think).
I have not yet figured out the strategy for the search (the MySql LIKE operator is likely to be slow and/or a hog I am guessing for such a large # records), but thats for another SO question. For this question, I am wondering what the pro's and cons are to creating this field as a tinytext, varchar, and char.
I am not a database expert, so any and all commentary is helpful. Thanks -
TINYTEXT shines over VARCHAR when storing data that's under 255 characters with an inconsistent length and no need to be used for sorting criteria.
A CHAR field is a fixed length, and VARCHAR is a variable length field. This means that the storage requirements are different - a CHAR always takes the same amount of space regardless of what you store, whereas the storage requirements for a VARCHAR vary depending on the specific string stored.
TINYTEXT can store up to 255 characters i.e 255 bytes. It is suitable for storing summaries of articles, short notices, captions, etc. It takes 1-Byte overhead. MEDIUMTEXT. MEDIUMTEXT can store up to 16,777,215 characters i.e 16,777,215 bytes or 64MB of data.
In most circumstances, VARCHAR provides better performance, it's more flexible, and can be fully indexed. If you need to store longer strings, use MEDIUMTEXT or LONGTEXT, but be aware that very large amounts of data can be stored in columns of these types.
Use a CHAR
.
BLOB
's and TEXT
's are stored outside the row, so there will be an access penalty to reading them. VARCHAR
's are variable length, which saves storage space by could introduce a small access penalty (since the rows aren't all fixed length).
If you create your index properly, however, either VARCHAR
or CHAR
can be stored entirely in the index, which will make access a lot faster.
See: varchar(255) v tinyblob v tinytext
And: http://213.136.52.31/mysql/540
And: http://forums.mysql.com/read.php?10,254231,254231#msg-254231
And: http://forums.mysql.com/read.php?20,223006,223683#msg-223683
Incidentally, in my experience the MySQL regex
operator is a lot faster than LIKE
for simple queries (i.e., SELECT ID WHERE SOME_COLUMN REGEX 'search.*'
), and obviously more versatile.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With