Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL 5.5 : Which one of the following is a better storage for a text/varchar field in innodb?

Tags:

mysql

innodb

Requirement :

Page#1 -> Display users and 1-2 line preview of their latest 10 blog posts

Page#2 -> Display single blogpost with full text.

Method 1 :

MySQL table ->   userid -> varchar 50
                 post_id -> integer
                 post_title -> varchar 100
                 post_description -> varchar 10000

for page#1 , select user_id, post_title , post_description from blog_table . and substring of post_description is used to show preview in the listing.

for page#2 , select user_id , post_title, post_description where post_id = N

Method 2 :

 MySQL table ->   userid -> varchar 50
                  post_id -> integer
                  post_title -> varchar 100
                  post_brief -> varchar 250
                  post_description -> text

for page#1 , select user_id, post_title , post_brief from blog_table .

for page#2 , select user_id , post_title, post_description where post_id = N

Does storing two columns, one brief as varchar and one full as text ( since it accesses the file system , and should be queried only when needed ) , worth the performance benefit ?

Since, method 2, will store only pointer to the text in the row, whereas Method 1 will store full varchar 10K string in the row. Does it affect the amount of table data which can reside in RAM , hence affect read performance of queries ?

like image 535
DhruvPathak Avatar asked Feb 21 '13 07:02

DhruvPathak


3 Answers

The performance of SQL queries mostly depends on JOINs, WHERE clauses, GROUP BYs and ORDER BYs, not on the columns retrieved. The columns only have a noticable effect on the query's speed if significantly more data is retrieved which might have to go over a network to be processed by your programming language. That is not the case here.

Short answer: The difference in performance between the two proposed setups is likely to be very small.

For good speed, your post_id column should have a (unique) index. You are not selecting, sorting or grouping by any other column, so data can come straight from the table, which is a very fast process.

You are talking about "pages" here, so I'm guessing those are going to be presented to users - it seems unlikely that you want to show a table of thousands of blog posts on the same page to a human, therefor you probably do actually have ORDER BY and/or LIMIT clauses in your statements that you didn't include in your question.

But lets look a bit deeper into this whole thing. Lets assume we are actually reading tons of TEXT columns directly from hard disk, wouldn't we hit the drive's maximum reading speed? Wouldn't retrieving just a VARCHAR(250) be faster, especially since it saves you the extra LEFT() call?

We can get the LEFT() call off the table real quick. String functions are really fast - after all, it is the CPU just cutting off some of the data, which is a really fast process. The only times when they produce a noticable delay is when they're used in WHERE clauses, JOINs etc., but that is NOT because those functions are slow, but because they have to be run lots (possibly millions) of times in order to produce even a single row of results, and even more so, because those uses often prevent the database from using its indexes properly.

So in the end it comes down to: how fast can MySQL read the table contents from the database. And that in turn depends on the storage engine you are using and its settings. MySQL can use a number of storage engines, including (but not limited to) InnoDB and MyISAM. Both of these engines offer different file layouts for large objects such as TEXT or BLOB columns (but funnily enough, also VARCHARs). If the TEXT column is stored in a different page than the rest of the row, the storage engine has to retrieve two pages for every row. If it is stored along with the rest, it'll be just one page. For sequential processing this could be a major change in performance.

Here's a bit of background reading on that:

  • Blob Storage in InnoDB
  • MyISAM Dynamic vs. Compressed Data File Layouts

Long Answer: It depends :)

You would have to do a number of benchmark tests on your own hardware to actually make the call as to which layout is actually quicker. Given that the second setup introduces redundancy with its additional column, it is likely to perform worse in most scenarios. It will perform better if - and only if - the table structure allows the shorter VARCHAR column to fit into the same page on disk while the long TEXT column would be on another page.

Edit: More on TEXT columns and performance

There seems to be a common misconception about BLOBs and in-memory processing. Quite a number of pages (including some answers here on StackOverflow - I'll try to find them, and give an additional comment) state that TEXT columns (and all other BLOBs) cannot be processed in memory by MySQL, and as such are always a performance hog. That is not true. What's really happening is this:

If you run a query that involves a TEXT column and that query needs a temporary table to be processed, then MySQL will have to create that temporary table on disk rather than in memory, because MySQL's MEMORY storage engine cannot handle TEXT columns. See this related question.

The MySQL documentation states this (the paragraph is the same for all versions from 3.2 through 5.6):

Instances of BLOB or TEXT columns in the result of a query that is processed using a temporary table causes the server to use a table on disk rather than in memory because the MEMORY storage engine does not support those data types (see Section 8.4.3.3, “How MySQL Uses Internal Temporary Tables”). Use of disk incurs a performance penalty, so include BLOB or TEXT columns in the query result only if they are really needed. For example, avoid using SELECT *, which selects all columns.

It is the last sentence that confuses people - because that is just a bad example. A simple SELECT * will not be affected by this performance problem because it won't use a temporary table. If the same select was for example ordered by a non-indexed column, it the would have to use a temporary table and would be affected by this problem. Use the EXPLAIN command in MySQL to find out whether a query will need a temporary table or not.

By the way: None of this affects caching. TEXT columns can be cached just like anything else. Even if a query needed a temporary table and that had to be stored on disk, the result could still be cached if the system had the resources to do so, and the cache is not invalidated. In this regard, a TEXT column is just like anything else.

Edit 2: More on TEXT columns and memory requirements ...

MySQL uses the storage engine to retrieve records from disk. It will then buffer the results and hand them sequentially to the client. The following assumes that this buffer ends up in memory and not on disk (see above why)

For TEXT columns (and other BLOBs), MySQL will buffer a pointer to the actual BLOB. Such a pointer uses only a few bytes of memory, but requires the actual TEXT content to be retrieved from disk when the row is handed to the client. For VARCHAR columns (and everything else but BLOBs), MySQL will buffer the actual data. This will usually use more memory, because most of your texts are going to be more than just a few bytes. For calculated columns, MySQL will also buffer the actual data, just like with VARCHARs.

A couple of notes on this: Technically, the BLOBs will also be buffered when they are handed over to the client, but only one at a time - and for large BLOBs possibly not in its entirety. Since this buffer gets freed after each row, this does not have any major effect. Also, if a BLOB is actually stored in the same page as the rest of the row, it may end up being treated like VARCHARs. To be honest, I've never had the requirement to return lots of BLOBs in a single query, so I never tried.

Now lets actually answer the (now edited) question:

Page #1. Overview of users and short blog post snippets.

Your options are pretty much these queries

SELECT userid, post_title, LEFT(post_description, 250) FROM `table_method_1`  <-- calculated based on a VARCHAR column
SELECT userid, post_title, LEFT(post_description, 250) FROM `table_method_2`  <-- calculated based on the TEXT column
SELECT userid, post_title, post_brief FROM `table_method_2`                   <-- precalculated VARCHAR column
SELECT userid, post_title, post_description FROM `table_method_2`             <-- return the full text, let the client produce the snippet

The memory requirements of the first three are identical. The fourth query will require less memory (the TEXT column will be buffered as a pointer,) but more traffic to the client. Since traffic usually is over a network (expensive in terms of performance,) this tends to be slower than the other queries - but your mileage may vary. The LEFT() function on the TEXT column might be sped up by telling the storage engine to use an inlined table layout, but this will depend on the average length of text being stored.

Page #2. A single blog post

SELECT userid, post_title, post_description FROM `table_method_1` WHERE post_id=... <-- returns a VARCHAR
SELECT userid, post_title, post_description FROM `table_method_2` WHERE post_id=... <-- returns a TEXT

The memory requirements are low to begin with, since only one single row will be buffered. For the reasons stated above the second will require a tiny bit less memory to buffer the row, but some additional memory to buffer a single BLOB.

In either case, I'm pretty sure you're not concerned with the memory requirements of a select that'll only return a single row, so it does not really matter.

Summary

If you have text of arbitrary length (or anything that requires more than a few kilobytes), you should use TEXT columns. That's what they're there for. The way MySQL handles those columns is beneficial most of the time.

There are only two things to remember for everyday use:

  • Avoid selecting TEXT columns, BLOB columns and all other columns that may have lots of data (and yes, that includes a VARCHAR(10000)) if you don't actually need them. The habit of "SELECT * FROM whatever" when all you need is a couple of values will put a lot of unnecessary stress on the database.
  • When you are selecting TEXT columns or other BLOBs, make sure the select does not use a temporary table. Use the EXPLAIN syntax when in doubt.

When you stick to those rules, you should get fairly decent performance from MySQL. If you need further optimation than that, you'll have to look at the finer details. This will include storage engines and respective table layouts, statistical information on the actual data, and knowledge about the hardware involved. From my experience, I could usually get rid of performance hogs without having to dig that deep.

like image 58
Hazzit Avatar answered Nov 05 '22 00:11

Hazzit


Method 2 looks better but if you are storing HTML there post_brief could also be TEXT column, if it's pure text you could store everything in one column and use

SELECT user_id, post_title, LEFT(post_description,255) AS post_brief FROM blog_table.

Consider MySQL 5.6, it is much faster and you can use FULLTEXT Index in InnoDB, so in case of searching posts it will help a lot

like image 45
demsey Avatar answered Nov 04 '22 23:11

demsey


Option 2 looks good to me also. As the blogpost is going to be huge, applying function on that columns should also take time.

And if you ask me, the data type of the post_description should be blob/text. Eventhough blob columns doesnot support search, that would be better option.

Only disadvantage of having as two columns is, you have to make sure both desc and brief are in sync(May be you can make it as a feature too )

like image 28
georgecj11 Avatar answered Nov 05 '22 01:11

georgecj11