I know that when using a text type field within a MySql table, the data is not stored inline but only a 'pointer' is stored in the row. I only want to infrequently retrieve the text field, so is it better to keep it in the same table but omit it from query results or to keep it in a separate table and join on that table when i want to read it?
This table could potentially have billions of rows, be partitioned and have large (100k -> 1Mb) text field values.
Joins: If your query joins two tables in a way that substantially increases the row count of the result set, your query is likely to be slow. There's an example of this in the subqueries lesson. Aggregations: Combining multiple rows to produce a result requires more computation than simply retrieving those rows.
In many cases, it may be best to split information into multiple related tables, so that there is less redundant data and fewer places to update.
Does the number of columns on a table have impact on performance? Yes, more number of columns, more metadata is needed. We suggest to use least number of columns needed, ideally < 100 columns on a table if possible.
MySQL has hard limit of 4096 columns per table, but the effective maximum may be less for a given table. The exact column limit depends on several factors: The maximum row size for a table constrains the number (and possibly size) of columns because the total length of all columns cannot exceed this size.
A billion rows with a field that is 100k is, to say the least, big. That comes to 100 Tbytes of data (using the American definition of "terabyte"). According to the documentation:
The InnoDB storage engine maintains InnoDB tables within a tablespace that can be created from several files. This enables a table to exceed the maximum individual file size. The tablespace can include raw disk partitions, which permits extremely large tables. The maximum tablespace size is 64TB.
In other words, you may have bigger problems than performance. You will probably be spreading the table across multiple partitions.
If you are only occasionally retrieving the text and never using it for searches, I would suggest that you store it in a separate table. That way, you can customize that table for access to these records. You'll have a primary key used for reference and all references will be through that id.
If you are using the text for searches, particularly searches combined with the "fixed" data, then my architectural preference would be to include it in the base table to facilitate the searching across fields.
However, even with this preference, it is probably safer to put it in a different table. For instance, MySQL instantiates subqueries. It is very typical to use *
for a subquery. Consider a simple case: a query to get the 1000 most recent records ordered by userid:
select t.*
from (select t.*
from t
order by createddate
limit 1000
) t
order by userid
The use of t.*
means that the text column would also be retrieved. So a query that might take a fraction of a second (with an index) would have to read and write 1000*100k = 100 Mbytes of data (at least). This would probably take a bit longer.
In conclusion, I would advocate putting the text column in a table where it is often being searched with other columns -- for example, in a database of abstracts of scientific papers. For really large data, I would put it in a separate field, so I could better manage the storage in extreme cases.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With