I'm trying to figure out how best to query against a schema that consists of one central table, plus a number of "attribute" tables (sorry, not sure of the best terminology here) that record one-to-many relationships. In the business layer, each of these tables corresponds to a collection that may contain zero or more elements.
Right now the code I'm looking at retrieves the data by getting a list of values from the master table, then looping over it and querying each of the "accessory" tables to populate these collections.
I'd like to try and get it down to a single query if I can. I tried using multiple LEFT JOIN
s. But this effectively joins against a cross product of the values in the accessory tables, which leads to an explosion of rows - especially when you add a few more joins. The table in question includes five such relationships, so the number of rows returned for each record is potentially enormous, and almost entirely composed of redundant data.
Here's a smaller synthetic example of some tables, data, the query structure I'm using, and results:
Database structure & data:
create table Containers (
Id int not null primary key,
Name nvarchar(8) not null);
create table Containers_Animals (
Container int not null references Containers(Id),
Animal nvarchar(8) not null,
primary key (Container, Animal)
);
create table Containers_Foods (
Container int not null references Containers(Id),
Food nvarchar(8) not null,
primary key (Container, Food)
);
insert into Containers (Id, Name)
values (0, 'box'), (1, 'sack'), (2, 'bucket');
insert into Containers_Animals (Container, Animal)
values (1, 'monkey'), (2, 'dog'), (2, 'whale'), (2, 'lemur');
insert into Containers_Foods (Container, Food)
values (1, 'lime'), (2, 'bread'), (2, 'chips'), (2, 'apple'), (2, 'grape');
Coupled to a business object like this:
class Container {
public string Name;
public string[] Animals; // may be empty
public string[] Foods; // may be empty
}
And here's the way I'm constructing the query against it:
select c.Name container, a.Animal animal, f.Food food from Containers c
left join Containers_Animals a on a.Container = c.Id
left join Containers_Foods f on f.Container = c.Id;
Which gives these results:
container animal food
--------- -------- --------
box NULL NULL
sack monkey lime
bucket dog apple
bucket dog bread
bucket dog chips
bucket dog grape
bucket lemur apple
bucket lemur bread
bucket lemur chips
bucket lemur grape
bucket whale apple
bucket whale bread
bucket whale chips
bucket whale grape
What I'd like to see instead is a number of rows equal to the maximum number of values associated with the root table on any of the relationships, with empty space filled in with NULLs. That would keep the number of rows returned way, way, way down, while still being easy to transform into objects. Something like this:
container animal food
--------- -------- --------
box NULL NULL
sack monkey lime
bucket dog apple
bucket lemur bread
bucket whale chips
bucket NULL grape
Can it be done?
Why not just return two data sets ordered by container, and then do a logical merge join on them in the client? What you're asking for is going to make the DB engine do a lot more work, with a lot more complicated query, for (to me) small benefit.
It would look something like this. Use two left joins to make sure each data set has at least one instance of all container names, then loop through them simultaneously. Here is some rough pseudocode:
Dim CurrentContainer
If Not Animals.Eof Then
CurrentContainer = Animals.Container
End If
Do While Not Animals.Eof Or Not Foods.Eof
Row = New Couplet(AnimalType, FoodType);
If Animals.Animal = CurrentContainer Then
Row.AnimalType = Animals.Animal
Animals.MoveNext
End If
If Foods.Container = CurrentContainer Then
Row.FoodType = Foods.Food
Foods.MoveNext
End If
If Not Animals.Eof AndAlso Animals.Container <> CurrentContainer _
AndAlso Not Foods.Eof AndAlso Foods.Container <> CurrentContainer Then
CurrentContainer = [Container from either non-Eof recordset]
EndIf
'Process the row, output it, put it in a stack, build a new recordset, whatever.
Loop
However, of course what you're asking for is possible! Here are two ways.
Treat the inputs separately and join on their position:
WITH CA AS (
SELECT *,
Row_Number() OVER (PARTITION BY Container ORDER BY Animal) Pos
FROM Containers_Animals
), CF AS (
SELECT *,
Row_Number() OVER (PARTITION BY Container ORDER BY Food) Pos
FROM Containers_Foods
)
SELECT
C.Name,
CA.Animal,
CF.Food
FROM
Containers C
LEFT JOIN (
SELECT Container, Pos FROM CA
UNION SELECT Container, Pos FROM CF
) P ON C.Id = P.Container
LEFT JOIN CA
ON C.Id = CA.Container
AND P.Pos = CA.Pos
LEFT JOIN CF
ON C.Id = CF.Container
AND P.Pos = CF.Pos;
Concatenate the inputs vertically and pivot them:
WITH FoodAnimals AS (
SELECT
C.Name,
1 Which,
CA.Animal Item,
Row_Number() OVER (PARTITION BY C.Id ORDER BY (CA.Animal)) Pos
FROM
Containers C
LEFT JOIN Containers_Animals CA
ON C.Id = CA.Container
UNION
SELECT
C.Name,
2 Which,
CF.Food,
Row_Number() OVER (PARTITION BY C.Id ORDER BY (CF.Food)) Pos
FROM
Containers C
LEFT JOIN Containers_Foods CF
ON C.Id = CF.Container
)
SELECT
P.Name,
P.[1] Animal,
P.[2] Food
FROM
FoodAnimals FA
PIVOT (Max(Item) FOR Which IN ([1], [2])) P;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With