Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimal fill factor to prevent fragmentation

Currently I use a daily job to REORGANIZE 1000+ indexes with > 5% and < = 30% fragmentation and REBUILD indexes with > 30% fragmentation: https://msdn.microsoft.com/en-us/library/ms189858.aspx

All indexes are rebuild with a fill factor of 80%, but based on my latest check, the fragmentation levels of 100+ indexes are unchanged. Most of them with a high fragmentation. I tried to play with the fill factor values in a test environment, but unfortunately can't simulate the production environment.

I'm wondering if finding the 'best' fill-factor for each individual index is a good idea?

like image 587
Citizen SP Avatar asked Nov 20 '25 20:11

Citizen SP


1 Answers

[is] finding the 'best' fill-factor for each individual index is a good idea?

If the options are:

  1. Keep the current global 80% FILLFACTOR

    or

  2. Find the best FILLFACTOR for each table

then absolutely YES, find the most appropriate value for each table. Of course, had there been an option for:

  1. Put everything back to the default FILLFACTOR of 0 (same thing as 100) and apply a lower value--determined per table--to only those tables that should benefit from it

then I would have chosen #3 :-). Why? Because fragmentation and fillfactor can both be a bit complicated and tricky. And setting a globally low (80 is "low" given that the default is 100) value probably has a negative impact on a larger group of tables than the benefit you might be getting on the tables where it makes sense to have it.

Consider:

  • Fragmentation is one of several factors that influence performance: And this particular factor is a trade-off with the size of the table since it affects how many rows fit on a data page. The fewer rows on a data page means more pages need to be read from disk (not quick) to satisfy queries, and those pages will take up more memory (i.e. the Buffer Pool). In fact, there are a great many negative effects resulting from tables being larger than they should be, such as index maintenance / backup / restore / update stats / etc operations taking longer than they should.

    Setting the fill factor too low on large tables means that the tables will be even larger. The increase in disk reads and size required in the Buffer Pool needs to be balanced with the types of operations against the table. Singleton operations aren't affected so much by fragmentation, so if that is by far the majority use case, then you can err on the side of reducing the number of data pages required by the table. If you have a lot of range operations then you might need to err on the side of less fragmentation.

  • Data access patterns: Is the table being mostly appended to? If INSERTs are happening at the end of the table only, then fragmentation can only really occur if updates are occurring that either increase the size of rows with variable-length datatypes, or if the row moves position due to a change in value of 1 or more key fields.

    Also, deleting large amounts of rows can cause fragmentation. This happens when no rows are left on the data page. This is a situation where fragmentation not only cannot be mitigated by lowering the FILLFACTOR (even if all other conditions are favorable for lowering it), but would seem to actually be made worse by lowering it. If deletes occur frequently enough to leave empty data pages, then reducing the number of rows on those pages would increase the rate at which they become empty (i.e. between 3 data pages mostly filled with 500 rows each, and 5 data pages--with a lower FILLFACTOR--filled with only 300 rows each, deleting 700 rows will leave 1 empty data page in the first scenario but 2 empty data pages in the second scenario). And more empty data pages means more "unused" space.

  • Row size: A table with a row size of 100 bytes will have little "wasted" space due to trying to maintain a particular fillfactor. Meaning, if wanting to fill a page 80%, then a small row size will probably lead to actually filling the page 78% (as an example). But a row size of 3500 bytes will lead to only 1 row per page, which is really just under 50% used. And in the end, how many rows do you think need to be "reserved" for out of sequence inserts or rows that expand in size? A row size of 3500 bytes would only fit 1 more row on the page anyway so not much was really saved. A row size of 100 bytes on the other hand would reserve space for quite a few rows, and this is good, but only if it will be used.

  • Data distribution across the entire table: Meaning, let's say you have a table with 100 million rows. And let's also say that this table does allow for non-sequential inserts and/or updates that expand the size of the row. If the locations of the inserts or updates that could cause fragmentation are evenly distributed (or at least cover 50% of the table), then a lower FILLFACTOR could be useful. But, if the inserts and/or updates are confined to the most recent 5 million rows, then why reserve free space across the first 95 million rows when it will never be used? For example, if you have a table that is ordered on a DATETIME field, holds data for several years, and changes only occur in the most recent 2 months, then you might as well use 100%.

  • FILLFACTOR only applies when creating or rebuilding indexes: Newly created data pages (including those created from pages splits) will fill to 100% (or as close as it can get). Meaning, if you insert a lot of data such that several (or many) new datapages are created, and the inserts are done sequentially such that there is no fragmentation at the end of the inserts, but then somehow the rows are updated in such a way as to cause fragmentation, or maybe new inserts happen that are spread among the rows inserted a moment ago, then there is no way to prevent that fragmentation anyway (at least not without doing a REBUILD after every group of inserts, and that is just silly).

Hence, the situations that truly benefit from a lower (than the default 100%--expressed as 0) FILLFACTOR are far fewer than those that benefit from the default. So set them all back to 100 (or 0) and look for tables that fit the following profile:

  • Not small. This is very subjective, but I would think anything under 10,000 rows can be ignored (i.e. get the default)

  • Row size is under 1000 bytes (maybe even less than 1000?). If you are only reserving space for 1 or 2 rows then you are doing more harm than good.

  • Data access patterns that can cause fragmentation: non-sequential inserts, and updates that expand the size of the row or cause its location to move.

    Be careful to consider how much of the fragmentation is being caused by deletes that leave empty data pages. This type of fragmentation is adversely affected by lowering the FILLFACTOR, so deletes should make up, at most, a small proportion of the fragmentation.

  • Data distribution that results in the fragmentation getting distributed somewhat evenly across the index instead of being confined to 40% or less of it

Keep in mind:

  • Like many (or most?) other optimizations, the effects are proportional to the scale of the system. Small systems won't see much of an effect, but the larger the tables get, the more noticeable proper vs improper settings become.

  • It is certainly possible that a system naturally behaves in such a way that the "optimal" FILL FACTOR for all tables somehow does end up being the same--whether 80% or some other value. I am not sure how probable it is that such a system exists, but it is certainly within the realm of possibilities.

like image 127
Solomon Rutzky Avatar answered Nov 23 '25 10:11

Solomon Rutzky



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!