I have a very typical situation. We have a table called Users which has a column called Branches (varchar 1000).
The organization can have 1000 branches. So if a user has access to branch 1, 5, and 10, the branches string would look like:
1000100001000000000......
(i.e. 1 for a position a User has branch access to based on the branch's number). Please do not advise better data storage options, this is coming to me from a legacy application that is deployed across continents.
Now given this background (and considering that there can be > 10000 users), I want to search for all Users who have access to any one of given set of branches, e.g. Find all users who have access to either branch 10, 65, 90 or 125.
One easy solution is to convert the desired set of branches (i.e. 10, 65, 90, 125) to a branch string (00000010100 etc), then use a scalar UDF to iterate over both the branch strings and return true at first matching occurence where 2 branch strings have 1, and false if there is not a 1 at common position.
Other than that, I also have an option of searching in application in C#. Some of these users are privileged (approx 1000 or more) and their data is cached in application as it is accessed very frequently. But for other users that are not privileged, data is only in db.
I have 2 questions here: 1) For a db search, is there a better way other than the UDF approach I mentioned. 2) For privileged users, what would be better in terms of performance, search in application (which further can be based on a for loop on branch strings like in UDF, or as a Linq Intersect operator on 2 branch arrays, i.e. a Linq Intersect on [1,5,9,50,80,200] and [6,90,256,300] etc.) Would a db search produce faster results or an application based search?
Consider there might be other parameters for search in both cases, e.g. Last name starts with.
My current approach is to filter rows in db for both situations first on other parameters (like Last name starts with). Then use a scalar UDF to filter this result-set based on branches and then return the results.
Do it in SQL, it will be only 100 times faster than doing it in C# or other front end.
Use the built-in numbers table to break the long string into positions (number series goes up to 2047).
Sample tables
create table users (userid int)
insert users select 1 union all select 2
create table permission (userid int, bigstr varchar(1000))
insert permission
select 1, REPLICATE('0', 56) + '1' -- 57th
+ REPLICATE('0', 32) + '1' -- 90th
+ REPLICATE('0', 64) + '1' -- 155th
+ REPLICATE('0', 845)
insert permission
select 2, REPLICATE('0', 66) + '1' -- 67th
+ REPLICATE('0', 98) + '1' -- 166th
+ REPLICATE('0', 657) + '1' -- 824th
+ REPLICATE('0', 176)
Sample showing all the matching permissions against a list
select *
from users u
inner join permission p on p.userid=u.userid
inner join master..spt_values v on v.type='p'
and SUBSTRING(p.bigstr,v.number,1) = '1'
and v.number between 1 and LEN(p.bigstr) -- or 1000 if it is always 1000
where v.number in (57,90,824)
To find users who have at access to at least one branch in the list:
select distinct u.userid
from users u
inner join permission p on p.userid=u.userid
inner join master..spt_values v on v.type='p'
and SUBSTRING(p.bigstr,v.number,1) = '1'
and v.number between 1 and LEN(p.bigstr) -- or 1000 if it is always 1000
where v.number in (57,90,824)
etc..
You might want to construct strings of the form __1_1____1%
for a LIKE query to find all users who have access to branches 3, 5, and 10.
To construct these strings the easiest way is to start with a string of _
characters that's as long as the largest branch number in your set (or larger), and then replace individual _
characters with 1
characters, and then append the % at the end.
As for whether this is faster than doing a loop in the database or a loop in your application, I think your best approach is to just test it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With