Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Join table index performance improvements concerns

To start with, I have three tables, all with a primary key and some other data. These tables are joined in a join table with ~70 million rows: Table_1, Table_2, Table_3.

There is a primary key in this join table across Table_3_Id, Table_1_Id, Table_2_Id (in this order). There is also a non-clustered index over Table_1_Id, Table_2_Id, Table_3_Id (in this order) with a fill index of 95.

The data is filtered by Table_1_Id (I have a preset of ~100 of these Ids) and (through a join) with a property from Table_3 (so it uses Table_3_Id). Then, Table_1_Id and Table_2_Id values are returned. This is all done in one query in the Entity Framework.

This is the query:

  var items = dataContext.TablesJoin.AsNoTracking()
             .Join(dataContext.Table_3.AsNoTracking(), x => x.Table_3_Id, x => x.Id, (combi, scan) => new { combi, scan })
             .Where(x => possibleIds.Contains(x.combi.Table_1_Id) && otherIds.Contains(x.scan.Other_Id))
             .Select(x => new { FirstId = x.combi.Table_1_Id, SecondId = x.combi.DeviceInformationDevices_Id })
              ToList();

Because this is configuration is running on SQL Server Express, I'm running into some space problems (10GB is the max). The data is roughly 2GB, but the primary key and index are a total of 5GB. Because there is also more data in the database, I'm interested in reducing the size of the index while retaining performance.

After looking at everything, I had some concerns about what is exactly used. Because of of the join I'm not entirely sure how useful it is to include Table_3_Id in the non-clustered index. Removing this column from the index saves around 1GB of space.

Initially, I had this table as a clustered index (to safe space) but because the table has quite the amount of inserts (1000 / hour) it was very slow because of all the disk access as it had to constantly swap the 10GB of data around. Would it help if the fill factor was set lower (like 70) to get around this? Of course, it would also mean more wasted space but if this could save a lot on the index it might be worth it?

This table is used a lot and for performance the index is needed. Running it without an index takes a few minutes to execute, whereas with the index its done within 2 seconds.

Execution plan xml: http://pastebin.com/raw.php?i=tfUxgYrK

like image 367
The Cookies Dog Avatar asked Apr 30 '26 18:04

The Cookies Dog


1 Answers

You don't need the primary key for uniqueness. Your NCI already provides that uniqueness. You can get rid of one of those indexes. That should save some space.

You can save the space that the other index uses by making it clustered. You notices performance problems due to, apparently, randomly located inserts. This is plausible. Consider changing the column order of the index so that inserts will happen in only one or a few places. That way all pages affected will tend to be cached. The working set required by DML will be low.

The DML perf problems are probably not due to page splits. These cause mostly CPU load and fragmentation. The perf problems are probably because random pages must be read from disk.

1000 inserts per hour are not a lot. Consider accumulating writes into a delta table that is small and fully cached. Move the rows over to the main table in a background process. That way DML latency is off the critical path. Selects would either need to tolerate staleness or UNION ALL the delta table.

like image 138
usr Avatar answered May 02 '26 07:05

usr