I’m looking at an implement for multi-tenancy in SQL Server. I'm considering a shared database, shared schema and tenant view filter described here. The only drawback is a fragmented connection pool...
Per http://msdn.microsoft.com/en-au/architecture/aa479086, Tenant View Filter is described as follows:
"SQL views can be used to grant individual tenants access to some of the rows in a given table, while preventing them from accessing other rows.
In SQL, a view is a virtual table defined by the results of a SELECT query. The resulting view can then be queried and used in stored procedures as if it were an actual database table. For example, the following SQL statement creates a view of a table called Employees, which has been filtered so that only the rows belonging to a single tenant are visible:
CREATE VIEW TenantEmployees AS
SELECT * FROM Employees WHERE TenantID = SUSER_SID()
This statement obtains the security identifier (SID) of the user account accessing the database (which, you'll recall, is an account belonging to the tenant, not the end user) and uses it to determine which rows should be included in the view"
Thinking this through , if we have one database storing say 5,000 different tenants, then the connection pool is completely fragmented and every time a request is sent to the database ADO.NET needs to establish a new connection and authenticate (remember connection pooling works for each unique connection string) and this approach means you have 5,000 connection strings…
How worried should I be about this? Can someone give me some real world examples of how significant an impact the connection pool has on a busy multi-tenant database server (say servicing 100 requests per second)? Can I just throw more hardware at the problem and it goes away?
Thoughts ??
Instead of opening and closing connections for every request, connection pooling uses a cache of database connections that can be reused when future requests to the database are required. It lets your database scale effectively as the data stored there and the number of clients accessing it grow.
A connection pool is created for each unique connection string. When a pool is created, multiple connection objects are created and added to the pool so that the minimum pool size requirement is satisfied. Connections are added to the pool as needed, up to the maximum pool size specified (100 is the default).
The following are four parameters that control most of the connection pooling behavior: Connect Timeout- controls the wait period in seconds when a new connection is requested, if this timeout expires, an exception will be thrown. Default is 15 seconds. Max Pool Size- specifies the maximum size of your connection pool.
My sugestion will be to develop a solid API over your database. Scalability, modularity, extensibility, accounting will be the main reasons. Few years down the line you may be found swearing at yourself for playing with SUSER_SID(). For instance, consider multiple tenants managed by one account or situations like whitelabels...
Have a data access api, which will take care of authentication. You can still do authorisation on the DB level, but it's a whole different topic then. Have users and perhaps groups and grant them permissions to tenants.
For huge projects nevertheless, you'll still find it better to have a single DB per big player.
I see I did not answer your main question about fragmented connection pool performance, but I'm convinced there are many valid arguments not to go that path nevertheless.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With