Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Joining tables when data may or may not exist

Let me start out by saying I didn't design this database; just trying to work with it.

I'm trying to retrieve the failures for a set of bicycles, where the most important determining factor is whether any of the parts in the bicycle have a particular attribute. The attribute is set in the parts table. The part is part of an assembly, which has a reference to a bigger assembly. The assembly may have a particular bicycle type assigned to it; if it doesn't, we assume that ALL bicycle types are assigned to the assembly. The parts may also have specific bicycles assigned to them, identified by a serial number.

So, we can assume:

  1. A record in the failures table will always include a serial number, higher assembly, and bicycle type.
  2. A part's assembly will always have a reference to a higher assembly
  3. A part's assembly may or may not have references to bicycle types.
  4. A part may or may not have references to specific serial numbers

When searching for failures where there is a part with a particular attribute, if the part has references to particular bicycles, we only want to find those. If it doesn't, and the part's assembly has references to specific bicycle types, we only want to find failures related to the assemblies that have references to those types, and that contain those parts. Otherwise, we want to find all failures that are related to the higher assemblies that contain the parts.

My problem is that if I join on the serial numbers, I always get only parts with serial numbers assigned, and if I join on the bicycle types, I only get the parts whose assemblies have types assigned. I'm not sure whether I'm attempting something that's not realistic given the database design, or whether I'm approaching the joins incorrectly.

Following is the query so far.

SELECT f_bicycle_type, f_serial_number, f_big_assembly
FROM ( 
    SELECT DISTINCT f.f_bicycle_type, f.f_serial_number, f.f_big_assembly, p_important_attr 
    from failures f 
    left outer join (    
        select distinct bt.bt_bicycle_type, b_serial_number, a_big_assembly, p_important_attr  
        from (          
            select distinct b.b_serial_number, a.a_big_assembly, p.p_assembly_id, p.p_important_attr
            from parts p
            join assemblies a on p.p_assembly_id = a.a_assembly_id
            left outer join parts_bicycles b on b.b_part_id = p.p_id  
            where p.p_important_attr = 'awesome'
        ) p_join_a_and_b 
        left outer join assembly_bicycle_types bt on bt.bt_assembly_id = p_join_a_and_b.p_assembly_id 
    ) p_join_a_and_b_join_bt 
    on f.f_big_assembly = p_join_a_and_b_join_bt.a_big_assembly 
    -- problem join clause - if an explicit type has not been assigned to the assembly, we want to include ALL types
    and f_bicycle_type = p_join_a_and_b_join_bt.bt_bicycle_type
    -- problem join clause - there may not be explicit serial numbers assigned to a given part
    and f_serial_number = b_serial_number
) z
WHERE p_important_attr = 'awesome';

Test case sql (for Oracle):

CREATE TABLE failures (
f_bicycle_type VARCHAR(20),
f_serial_number NUMBER(20),
f_big_assembly VARCHAR(5)); 

CREATE TABLE parts (
p_id NUMBER(20),
p_assembly_id NUMBER(20),
p_important_attr VARCHAR(20));

CREATE TABLE assemblies (
a_assembly_id NUMBER(20),
a_big_assembly VARCHAR(5)); 

CREATE TABLE parts_bicycles (
b_part_id NUMBER(20),
b_serial_number NUMBER(20));    

CREATE TABLE assembly_bicycle_types (
bt_assembly_id NUMBER(20),
bt_bicycle_type VARCHAR(20));

INSERT ALL
INTO failures (f_bicycle_type, f_serial_number, f_big_assembly)
VALUES ('tandem', 1000001, 'A1000')
INTO failures (f_bicycle_type, f_serial_number, f_big_assembly)
VALUES ('bmx', 1000002, 'A1000')
INTO failures (f_bicycle_type, f_serial_number, f_big_assembly)
VALUES ('tandem', 1000003, 'B1000')
INTO failures (f_bicycle_type, f_serial_number, f_big_assembly)
VALUES ('cruiser', 1000004, 'B1000')  
INTO failures (f_bicycle_type, f_serial_number, f_big_assembly)
VALUES ('bmx', 1000005, 'C1000')  
INTO failures (f_bicycle_type, f_serial_number, f_big_assembly)
VALUES ('motocross', 1000006, 'C1000')
INTO failures (f_bicycle_type, f_serial_number, f_big_assembly)
VALUES ('cruiser', 1000007, 'C1000')
INTO failures (f_bicycle_type, f_serial_number, f_big_assembly)
VALUES ('bmx', 1000008, 'D1000')
INTO failures (f_bicycle_type, f_serial_number, f_big_assembly)
VALUES ('bmx', 1000009, 'D1000')
INTO failures (f_bicycle_type, f_serial_number, f_big_assembly)
VALUES ('cruiser', 1000010, 'E1000')
INTO parts (p_id, p_assembly_id, p_important_attr)
VALUES (1, 1001, 'awesome')
INTO parts (p_id, p_assembly_id, p_important_attr)
VALUES (2, 1001, 'ordinary')
INTO parts (p_id, p_assembly_id, p_important_attr)
VALUES (3, 2001, 'awesome')
INTO parts (p_id, p_assembly_id, p_important_attr)
VALUES (4, 3001, 'awesome')
INTO parts (p_id, p_assembly_id, p_important_attr)
VALUES (5, 4001, 'awesome')
INTO parts (p_id, p_assembly_id, p_important_attr)
VALUES (6, 5001, 'ordinary')
INTO assemblies (a_assembly_id, a_big_assembly)
VALUES (1001, 'A1000')
INTO assemblies (a_assembly_id, a_big_assembly)
VALUES (2001, 'B1000')
INTO assemblies (a_assembly_id, a_big_assembly)
VALUES (3001, 'C1000')
INTO assemblies (a_assembly_id, a_big_assembly)
VALUES (4001, 'D1000')
INTO assemblies (a_assembly_id, a_big_assembly)
VALUES (5001, 'E1000')
INTO parts_bicycles (b_part_id, b_serial_number)
VALUES (4, 1000005)
INTO parts_bicycles (b_part_id, b_serial_number)
VALUES (4, 1000006)
INTO parts_bicycles (b_part_id, b_serial_number)
VALUES (5, 1000008)
INTO assembly_bicycle_types (bt_assembly_id, bt_bicycle_type)
VALUES (02001, 'tandem')
INTO assembly_bicycle_types (bt_assembly_id, bt_bicycle_type)
VALUES (04001, 'bmx')
SELECT * FROM DUAL;

For MySQL:

 CREATE TABLE failures (
f_bicycle_type VARCHAR(20),
f_serial_number INTEGER(20),
f_big_assembly VARCHAR(5));
CREATE TABLE parts(
p_id INTEGER( 20 ) ,
p_assembly_id INTEGER( 20 ) ,
p_important_attr VARCHAR( 20 )
);
CREATE TABLE assemblies(
a_assembly_id INTEGER( 20 ) ,
a_big_assembly VARCHAR( 5 )
);
CREATE TABLE parts_bicycles(
b_part_id INTEGER( 20 ) ,
b_serial_number INTEGER( 20 )
);
CREATE TABLE assembly_bicycle_types(
bt_assembly_id INTEGER( 20 ) ,
bt_bicycle_type VARCHAR( 20 )
);

INSERT INTO failures (f_bicycle_type, f_serial_number, f_big_assembly)
VALUES ('tandem', 1000001, 'A1000'),('bmx', 1000002, 'A1000'), ('tandem', 1000003, 'B1000'),    ('cruiser', 1000004, 'B1000') ,('bmx', 1000005, 'C1000'), ('motocross', 1000006, 'C1000')
,('cruiser', 1000007, 'C1000')
,('bmx', 1000008, 'D1000')
,('bmx', 1000009, 'D1000')
, ('cruiser', 1000010, 'E1000');
insert INTO parts (p_id, p_assembly_id, p_important_attr)
VALUES (1, 1001, 'awesome'), (2, 1001, 'ordinary'), (3, 2001, 'awesome'), (4, 3001, 'awesome'), (5, 4001, 'awesome'),(6, 5001, 'ordinary');
INSERT INTO assemblies (a_assembly_id, a_big_assembly)
VALUES (1001, 'A1000'), (2001, 'B1000'), (3001, 'C1000'), (4001, 'D1000'),(5001, 'E1000');
    INSERT INTO parts_bicycles (b_part_id, b_serial_number)
VALUES (4, 1000005),(4, 1000006),(5, 1000008)
INSERT INTO assembly_bicycle_types (bt_assembly_id, bt_bicycle_type)
VALUES (02001, 'tandem'), (04001, 'bmx');

Sample data and desired results:

-- failures table
-- f_bicycle_type   || f_serial_number  || f_big_assembly
---------------------------------------------------------
  tandem               1000001             A1000
  bmx                  1000002             A1000
  tandem               1000003             B1000
  cruiser              1000004             B1000
  bmx                  1000005             C1000
  motocross            1000006             C1000
  cruiser              1000007             C1000
  bmx                  1000008             D1000
  bmx                  1000009             D1000
  cruiser              1000010             E1000

  -- parts table
  -- p_id   || p_assembly_id    || p_important_attr
  ------------------------------------------------
     1          1001                awesome
     2          1001                ordinary
     3          2001                awesome
     4          3001                awesome
     5          4001                awesome
     6          5001                ordinary

  -- assemblies table
  -- a_assembly_id  || a_big_assembly
  -----------------------------------
     1001              A1000
     2001              B1000
     3001              C1000
     4001              D1000
     5001              E1000

  -- parts_bicycles table
  -- b_part_id  || b_serial_number
  --------------------------------
     4              1000005
     4              1000006
     5              1000008

  -- assembly_bicycle_types table
  -- bt_assembly_id || bt_bicycle_type
  ------------------------------------
     02001             tandem
     04001             bmx

-- desired results from failures table
-- f_bicycle_type   || f_serial_number  || f_big_assembly
---------------------------------------------------------
  tandem               1000001             A1000
  bmx                  1000002             A1000
  tandem               1000003             B1000
  bmx                  1000005             C1000
      motocross            1000006             C1000
  bmx                  1000008             D1000

And the actual results, with the problem joins in place:

-- actual results from failures table
-- f_bicycle_type   || f_serial_number  || f_big_assembly
---------------------------------------------------------
  bmx                  1000008             D1000
like image 898
earachefl Avatar asked Nov 21 '12 18:11

earachefl


2 Answers

There you go (PostgreSQL flavor):

WITH chosen_parts AS (
  SELECT * FROM parts LEFT JOIN parts_bicycles ON b_part_id = p_id 
    WHERE p_important_attr = 'awesome'
), chosen_assemblies AS (
  SELECT * FROM assemblies JOIN chosen_parts ON p_assembly_id = a_assembly_id 
    LEFT JOIN assembly_bicycle_types ON bt_assembly_id = a_assembly_id 
  WHERE b_serial_number IS NULL
)
SELECT failures.* FROM chosen_parts JOIN failures 
  ON f_serial_number = b_serial_number
UNION
SELECT failures.* FROM chosen_assemblies JOIN failures 
  ON f_big_assembly = a_big_assembly 
  WHERE bt_bicycle_type = f_bicycle_type
    OR bt_bicycle_type IS NULL;

Feel free to add an outer SELECT DISTINCT * FROM if duplicates are a concern.

like image 113
Divide Avatar answered Nov 03 '22 06:11

Divide


The following query returns your desired result set. Essentially, this build the relationships between parts, assemblies and bicycle types first, then performs a complex, prioritized join to failures to get the actual result.

SELECT DISTINCT f.f_bicycle_type, f.f_serial_number, f.f_big_assembly
FROM  parts p
      LEFT JOIN parts_bicycles pb
         ON p.p_id = pb.b_part_id
      LEFT JOIN assemblies a
         ON p.p_assembly_id = a.a_assembly_id
      LEFT JOIN assembly_bicycle_types abt
         ON a.a_assembly_id = abt.bt_assembly_id
      LEFT JOIN failures f
         ON -- First priority is parts that map directly
            pb.b_serial_number = f.f_serial_number 
            -- Second priority is assemblies that map to type
            OR (pb.b_serial_number IS NULL 
                AND abt.bt_bicycle_type = f.f_bicycle_type) 
            -- Third priority is assemblies that map directly
            OR (pb.b_serial_number IS NULL 
                AND abt.bt_bicycle_type IS NULL 
                AND a.a_big_assembly = f.f_big_assembly)
WHERE  p.p_important_attr = 'awesome'
ORDER BY f.f_serial_number  

SQL Fiddle

II don't think this addresses the issue where assemblies that are not assigned belong to all bicycles, but it's not clear from your sample data how that is intended to work.

like image 43
Allan Avatar answered Nov 03 '22 06:11

Allan