Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiple left joins - keeping the returned row count down?

I'm trying to figure out how best to query against a schema that consists of one central table, plus a number of "attribute" tables (sorry, not sure of the best terminology here) that record one-to-many relationships. In the business layer, each of these tables corresponds to a collection that may contain zero or more elements.

Right now the code I'm looking at retrieves the data by getting a list of values from the master table, then looping over it and querying each of the "accessory" tables to populate these collections.

I'd like to try and get it down to a single query if I can. I tried using multiple LEFT JOINs. But this effectively joins against a cross product of the values in the accessory tables, which leads to an explosion of rows - especially when you add a few more joins. The table in question includes five such relationships, so the number of rows returned for each record is potentially enormous, and almost entirely composed of redundant data.

Here's a smaller synthetic example of some tables, data, the query structure I'm using, and results:

Database structure & data:

create table Containers (
  Id int not null primary key,
  Name nvarchar(8) not null);

create table Containers_Animals (
  Container int not null references Containers(Id),
  Animal nvarchar(8) not null,
  primary key (Container, Animal)
  );

create table Containers_Foods (
  Container int not null references Containers(Id),
  Food nvarchar(8) not null,
  primary key (Container, Food)
  );

insert into Containers (Id, Name) 
  values (0, 'box'), (1, 'sack'), (2, 'bucket');

insert into Containers_Animals (Container, Animal)
  values (1, 'monkey'), (2, 'dog'), (2, 'whale'), (2, 'lemur'); 

insert into Containers_Foods (Container, Food)
  values (1, 'lime'), (2, 'bread'), (2, 'chips'), (2, 'apple'), (2, 'grape');

Coupled to a business object like this:

class Container {
    public string Name;
    public string[] Animals;  // may be empty
    public string[] Foods;    // may be empty
}

And here's the way I'm constructing the query against it:

select c.Name container, a.Animal animal, f.Food food from Containers c
  left join Containers_Animals a on a.Container = c.Id
  left join Containers_Foods f on f.Container = c.Id;

Which gives these results:

container animal   food
--------- -------- --------
box       NULL     NULL
sack      monkey   lime
bucket    dog      apple
bucket    dog      bread
bucket    dog      chips
bucket    dog      grape
bucket    lemur    apple
bucket    lemur    bread
bucket    lemur    chips
bucket    lemur    grape
bucket    whale    apple
bucket    whale    bread
bucket    whale    chips
bucket    whale    grape

What I'd like to see instead is a number of rows equal to the maximum number of values associated with the root table on any of the relationships, with empty space filled in with NULLs. That would keep the number of rows returned way, way, way down, while still being easy to transform into objects. Something like this:

container animal   food
--------- -------- --------
box       NULL     NULL
sack      monkey   lime
bucket    dog      apple
bucket    lemur    bread
bucket    whale    chips
bucket    NULL     grape

Can it be done?

like image 368
Sean U Avatar asked Feb 23 '12 00:02

Sean U


1 Answers

Why not just return two data sets ordered by container, and then do a logical merge join on them in the client? What you're asking for is going to make the DB engine do a lot more work, with a lot more complicated query, for (to me) small benefit.

It would look something like this. Use two left joins to make sure each data set has at least one instance of all container names, then loop through them simultaneously. Here is some rough pseudocode:

Dim CurrentContainer
If Not Animals.Eof Then
   CurrentContainer = Animals.Container
End If
Do While Not Animals.Eof Or Not Foods.Eof
   Row = New Couplet(AnimalType, FoodType);
   If Animals.Animal = CurrentContainer Then
      Row.AnimalType = Animals.Animal
      Animals.MoveNext
   End If
   If Foods.Container = CurrentContainer Then
      Row.FoodType = Foods.Food
      Foods.MoveNext
   End If
   If Not Animals.Eof AndAlso Animals.Container <> CurrentContainer _
      AndAlso Not Foods.Eof AndAlso Foods.Container <> CurrentContainer Then
      CurrentContainer = [Container from either non-Eof recordset]
   EndIf
   'Process the row, output it, put it in a stack, build a new recordset, whatever.
Loop

However, of course what you're asking for is possible! Here are two ways.

  1. Treat the inputs separately and join on their position:

    WITH CA AS (
        SELECT *,
            Row_Number() OVER (PARTITION BY Container ORDER BY Animal) Pos
        FROM Containers_Animals
    ), CF AS (
        SELECT *,
            Row_Number() OVER (PARTITION BY Container ORDER BY Food) Pos
        FROM Containers_Foods
    )
    SELECT
        C.Name,
        CA.Animal,
        CF.Food
    FROM
        Containers C
        LEFT JOIN (
            SELECT Container, Pos FROM CA
            UNION SELECT Container, Pos FROM CF
        ) P ON C.Id = P.Container
        LEFT JOIN CA
            ON C.Id = CA.Container
            AND P.Pos = CA.Pos
        LEFT JOIN CF
            ON C.Id = CF.Container
            AND P.Pos = CF.Pos;
    
  2. Concatenate the inputs vertically and pivot them:

    WITH FoodAnimals AS (
        SELECT
            C.Name,
            1 Which,
            CA.Animal Item,
            Row_Number() OVER (PARTITION BY C.Id ORDER BY (CA.Animal)) Pos
        FROM
            Containers C
            LEFT JOIN Containers_Animals CA
                ON C.Id = CA.Container
        UNION
        SELECT
            C.Name,
            2 Which,
            CF.Food,
            Row_Number() OVER (PARTITION BY C.Id ORDER BY (CF.Food)) Pos
        FROM
            Containers C
            LEFT JOIN Containers_Foods CF
                ON C.Id = CF.Container
    )
    SELECT
        P.Name,
        P.[1] Animal,
        P.[2] Food
    FROM
        FoodAnimals FA
        PIVOT (Max(Item) FOR Which IN ([1], [2])) P;
    
like image 140
ErikE Avatar answered Nov 15 '22 09:11

ErikE