For a multi-tenant single shared database should the tenantid field be included in the primary key and clustered index? Or is adding an additional index on tenantid just as performant?
We’re running into performance issues on a production system whose only index is the clustered index on the primary key.
ALL sql select statements start with tenantid in their linq to entities statements such as
invoiceitems.tenantid = thecurrenttenantid order by invoicedate
Tenants (tenantid uniqueidentifier primary key, tenantname) Foreign Keys (tenantid) Indexes(Clustered on tenantid)
Customers (tenantid uniqueidentifier, customerid uniqueidentifier primary key, customername varchar(50)) Foreign Keys (tenantid, customerid) Indexes (clustered on customerid)
Invoices (tenantid uniqueidentifier, invoiceid uniqueidentifier primary key, billcustomerid uniqueidentifier, shipcustomerid uniqueidentifier, invoicedate datetime) Foreign Keys (tenantid, billcustomerid, shipcustomerid) Indexes (clustered on invoiceid)
InvoiceItems (tenantid uniqueidentifier, invoiceitemid uniqueidentifier primarykey, invoiceid uniqueidentifier, lineitemorder int) Foreign Keys (tenantid, invoiceid) Indexes (clustered on invoiceitemid)
SqlAzure requires that each table has a clustered index so it's currently just on primarykeyid since that’s the default. Right now that is the only index on each table. There are various foreign keys in the tables throughout the system and none of the foreign key table fields are indexed.
We're trying to resolve some performance issues right now and were wondering what would be the best clustered index and if any other indexes might be helpful. We're hoping we don't have to change the existing clustered index unless we absolutely have to but we are willing to do so. In SqlAzure AFAIK you cannot simply adjust the clustered index in an existing table - you have to create a new table with the desired clustered index and insert all records from the old table to the new table (and handle all of the foreign key constraints and other table dependencies).
ALL sql select statements start with tenantid in their linq to entities statements.
invoiceitems.tenantid = thecurrenttenantid order by invoicedate
Some sql select statements just have an order - some have other join condition values when bringing in child tables like
invoiceitems.tenantid = thecurrenttenantid and invoice.invoiceid = invoiceitems.invoiceid order by invoicedate
Here are a few ideas (we're open to others besides this) - which of these would be best and why?
To speed up access to the tenant's records
Option 1 - Add a non-clustered index on tenantid
Invoices (tenantid uniqueidentifier, invoiceid uniqueidentifier primary key, billcustomerid uniqueidentifier, shipcustomerid uniqueidentifier, invoicedate datetime) Foreign Keys (tenantid, billcustomerid, shipcustomerid) Indexes (clustered on invoiceid, non-clustered on tenantid)
Option 2 - Change the primary key from primaryid to tenantid + primaryid and change the clustered index to tenantid + primaryid.
Invoices (tenantid uniqueidentifier primary key, invoiceid uniqueidentifier primary key, billcustomerid uniqueidentifier, shipcustomerid uniqueidentifier, invoicedate datetime) Foreign Keys (tenantid, billcustomerid, shipcustomerid) Indexes (clustered on tenantid + invoiceid)
To speed up joins
Option 3 - Add non-clustered indexes on all foreign key fields only on foreignkeyid.
Invoices (tenantid uniqueidentifier, invoiceid uniqueidentifier primary key, billcustomerid uniqueidentifier, shipcustomerid uniqueidentifier, invoicedate datetime) Foreign Keys (tenantid, billcustomerid, shipcustomerid) Indexes (clustered on invoiceid, non-clustered on billcustomerid, non-clustered on shipcustomerid)
Option 4 - Change all foreign keys from foreignkeyid to tenantid + foreignkeyid and add an index on tenantid + foreignkeyid
Invoices (tenantid uniqueidentifier, invoiceid uniqueidentifier primary key, billcustomerid uniqueidentifier, shipcustomerid uniqueidentifier, invoicedate datetime) Foreign Keys (tenantid, tenantid + billcustomerid, tenantid + shipcustomerid) Indexes (clustered on invoiceid, non-clustered on tenantid + billcustomerid, non-clustered on tenantid + shipcustomerid)
To speed up often used queries like select fields from invoices where tenantid = value order by invoicedate
Option 5 - add indexes on most often used sort order fields within each table besides tenantid.
Invoices (tenantid uniqueidentifier, invoiceid uniqueidentifier primary key, billcustomerid uniqueidentifier, shipcustomerid uniqueidentifier, invoicedate datetime) Foreign Keys (tenantid, billcustomerid, shipcustomerid) Indexes (clustered on invoiceid, non-clustered on invoicedate)
Option 6 - add indexes on tenantid + “most often used sort order field” within each table and add non-clustered index on tenantid + “most often used sort order field”
Invoices (tenantid uniqueidentifier, invoiceid uniqueidentifier primary key, billcustomerid uniqueidentifier, shipcustomerid uniqueidentifier, invoicedate datetime) Foreign Keys (tenantid, billcustomerid, shipcustomerid) Indexes (clustered on invoiceid, non-clustered on tenantid + invoicedate)
It seems like you've given this lots of thought. Regardless of what I or anyone else says the only way to know for sure is to measure it yourself. In which case this becomes less of a SQL Azure question and more of a general SQL Server query optimisation question.
For you situation there are a couple of tips to get you started. As you're using LINQ, you don't have direct access to the actual queries that are being run in SQL. You might think you know what the query should look like, but depending on which version of EF you're using, it can make some interesting decisions on how to structure the query. To find out exactly what queries are being run you'll need to use SQL Profiler or Extended Events. SQL Profiler doesn't work against SQL Azure, so you'll either need to user Extended Events or get a copy of your DB on a local server somewhere and run your application pointing at the local. The export data tier application and the related import in SQL Server Management Studio (SSMS) are very useful for this.
With the actual queries you can then run them in SSMS against the database in Azure to get the execution plan. You can then make a change to your indexes, run the query again and compare the plans. If you don't want to mess with your main development DB you can create a copy pretty easily with in a number of ways, including using the CREATE DATABASE xxx AS COPY OF yyyy
command.
Don't be tempted to do your optimisations on the local DB. SQL Azure has a different performance outline than most on premise SQL installs.
With all of that said, if all of you queries are always going to contain the tenant ID, then yes I would expect that including it as the first part of the clustered index would improve your query performance. For all other indexes, I'm not so sure, so I'd measure, measure, measure. Also remember that indexes don't come for free, every one you create impacts on your write performance and also the size of your DB, so I wouldn't go nuts and index everything.
Finally, don't worry about using guids for your PKs, if your DB gets large enough that you need to federate it by tenant ID (which your structure looks like it will handle quite nicely) IDENTITY columns stop becoming an option.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With