Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL query to search an unique ID that can be in three different tables

I have three tables that control products, colors and sizes. Products can have or not colors and sizes. Colors can or not have sizes.

product      color                           size
-------      -------                         -------
id           id                              id
unique_id    id_product (FK from product)    id_product (FK from version)
stock        unique_id                       id_version (FK from version)
title        stock                           unique_id
                                             stock

The unique_id column, that is present in all tables, is a serial type (autoincrement) and its counter is shared with the three tables, basically it works as a global unique ID between them.

It works fine, but i am trying to increase the query performance when i have to select some fields based in the unique_id.

As i don't know where is the unique_id that i am looking for, i am using UNION, like below:

select title, stock
from product 
where unique_id = 10

UNION

select p.title, c.stock
from color c
join product p on c.id_product = p.id
where c.unique_id = 10

UNION

select p.title, s.stock
from size s
join product p on s.id_product = p.id
where s.unique_id = 10;

Is there a better way to do this? Thanks for any suggestion!

EDIT 1

Based on @ErwinBrandstetter and @ErikE answers i decided to use the below query. The main reasons is:

1) As unique_id has indexes in all tables, i will get a good performance

2) Using the unique_id i will find the product code, so i can get all columns i need using a another simple join

SELECT 

    p.title,
    ps.stock

FROM (

    select id as id_product, stock
    from product 
    where unique_id = 10

    UNION

    select id_product, stock
    from color
    where unique_id = 10

    UNION

    select id_product, stock
    from size
    where unique_id = 10

) AS ps

JOIN product p ON ps.id_product = p.id;
like image 378
Marcio Mazzucato Avatar asked Jul 10 '12 21:07

Marcio Mazzucato


People also ask

Can you select from 3 tables in SQL?

Using JOIN in SQL doesn't mean you can only join two tables. You can join 3, 4, or even more! The possibilities are limitless.

How do I find the unique ID of a SQL table?

In SQL Server, you can use the sp_special_columns system stored procedure to identify a unique identifier for the table. Specifically, it returns the optimal set of columns that uniquely identify a row in the table. It also returns columns automatically updated when any value in the row is updated by a transaction.


2 Answers

PL/pgSQL function

To solve the problem at hand, a plpgsql function like the following should be faster:

CREATE OR REPLACE FUNCTION func(int)
  RETURNS TABLE (title text, stock int) LANGUAGE plpgsql AS
$BODY$
BEGIN

RETURN QUERY
SELECT p.title, p.stock
FROM   product p
WHERE  p.unique_id = $1; -- Put the most likely table first.

IF NOT FOUND THEN
    RETURN QUERY
    SELECT p.title, c.stock
    FROM   color c
    JOIN   product p ON c.id_product = p.id
    WHERE  c.unique_id = $1;
END;

IF NOT FOUND THEN
    RETURN QUERY
    SELECT p.title, s.stock
    FROM   size s
    JOIN   product p ON s.id_product = p.id
    WHERE  s.unique_id = $1;
END IF;

END;
$BODY$;

Updated function with table-qualified column names to avoid naming conflicts with OUT parameters.

RETURNS TABLE requires PostgreSQL 8.4, RETURN QUERY requires version 8.2. You can substitute both for older versions.

It goes without saying that you need to index the columns unique_id of every involved table. id should be indexed automatically, being the primary key.


Redesign

Ideally, you can tell which table from the ID alone. You could keep using one common sequence, but add 100000000 for the first table, 200000000 for the second and 300000000 for the third - or whatever suits your needs. This way, the least significant part of the number is easily distinguishable.

A plain integer spans numbers from -2147483648 to +2147483647, move to bigint if that's not enough for you. I would stick to integer IDs, though, if possible. They are smaller and faster than bigint or text.


CTEs (experimental!)

If you cannot create a function for some reason, this pure SQL solution might do a similar trick:

WITH x(uid) AS (SELECT 10) -- provide unique_id here
    , a AS (
    SELECT title, stock
    FROM   x, product 
    WHERE  unique_id = x.uid
    )
    , b AS (
    SELECT p.title, c.stock
    FROM   x, color c
    JOIN   product p ON c.id_product = p.id
    WHERE  NOT EXISTS (SELECT 1 FROM a)
    AND    c.unique_id = x.uid
    )
    , c AS (
    SELECT p.title, s.stock
    FROM   x, size s
    JOIN   product p ON s.id_product = p.id
    WHERE  NOT EXISTS (SELECT 1 FROM b)
    AND    s.unique_id = x.uid
    )
SELECT * FROM a
UNION ALL
SELECT * FROM b
UNION ALL
SELECT * FROM c;

I am not sure whether it avoids additional scans like I hope. Would have to be tested. This query requires at least PostgreSQL 8.4.


Upgrade!

As I just learned, the OP runs on PostgreSQL 8.1.
Upgrading alone would speed up the operation a lot.


Query for PostgreSQL 8.1

As you are limited in your options, and a plpgsql function is not possible, this function should perform better than the one you have. Test with EXPLAIN ANALYZE - available in v8.1.

SELECT title, stock
FROM   product 
WHERE  unique_id = 10

UNION ALL
SELECT p.title, ps.stock
FROM   product p
JOIN  (
    SELECT id_product, stock
    FROM   color
    WHERE  unique_id = 10

    UNION ALL
    SELECT id_product, stock
    FROM   size
    WHERE  unique_id = 10
    ) ps ON ps.id_product = p.id;
like image 117
Erwin Brandstetter Avatar answered Oct 03 '22 10:10

Erwin Brandstetter


I think it's time for a redesign.

You have things that you're using as bar codes for items that are basically all the same in one respect (they are SerialNumberItems), but have been split into multiple tables because they are different in other respects.

I have several ideas for you:

Change the Defaults

Just make each product required to have one color "no color" and one size "no size". Then you can query any table you want to find the info you need.

SuperType/SubType

Without too much modification you could use the supertype/subtype database design pattern.

In it, there is a parent table where all the distinct detail-level identifiers live, and the shared columns of the subtype tables go in the supertype table (the ways that all the items are the same). There is one subtype table for each different way that the items are distinct. If mutual exclusivity of the subtype is required (you can have a Color or a Size but not both), then the parent table is given a TypeID column and the subtype tables have an FK to both the ParentID and the TypeID. Looking at your design, in fact you would not use mutual exclusivity.

If you use the pattern of a supertype table, you do have the issue of having to insert in two parts, first to the supertype, then the subtype. Deleting also requires deleting in reverse order. But you get a great benefit of being able to get basic information such as Title and Stock out of the supertype table with a single query.

You could even create schema-bound views for each subtype, with instead-of triggers that convert inserts, updates, and deletes into operations on the base table + child table.

A Bigger Redesign

You could completely change how Colors and Sizes are related to products.

First, your patterns of "has-a" are these:

  • Product (has nothing)
  • Product->Color
  • Product->Size
  • Product->Color->Size

There is a problem here. Clearly Product is the main item that has other things (colors and sizes) but colors don't have sizes! That is an arbitrary assignment. You may as well have said that Sizes have Colors--it doesn't make a difference. This reveals that your table design may not be best, as you're trying to model orthogonal data in a parent-child type of relationship. Really, products have a ColorAndSize.

Furthermore, when a product comes in colors and sizes, what does the uniqueid in the Color table mean? Can such a product be ordered without a size, having only a color? This design is assigning a unique ID to something that (it seems to me) should never be allowed to be ordered--but you can't find this information out from the Color table, you have to compare the Color and Size tables first. It is a problem.

I would design this as: Table Product. Table Size listing all distinct sizes possible for any product ever. Table Color listing all distinct colors possible for any product ever. And table OrderableProduct that has columns ProductId, ColorID, SizeID, and UniqueID (your bar code value). Additionally, each product must have one color and one size or it doesn't exist.

Basically, Color and Size are like X and Y coordinates into a grid; you are filling in the boxes that are allowable combinations. Which one is the row and which the column is irrelevant. Certainly, one is not a child of the other.

If there are any reasonable rules, in general, about what colors or sizes can be applied to various sub-groups of products, there might be utility in a ProductType table and a ProductTypeOrderables table that, when creating a new product, could populate the OrderableProduct table with the standard set—it could still be customized but might be easier to modify than to create anew. Or, it could define the range of colors and sizes that are allowable. You might need separate ProductTypeAllowedColor and ProductTypeAllowedSize tables. For example, if you are selling T-shirts, you'd want to allow XXXS, XXS, XS, S, M, L, XL, XXL, XXXL, and XXXXL, even if most products never use all those sizes. But for soft drinks, the sizes might be 6-pack 8oz, 24-pack 8oz, 2 liter, and so on, even if each soft drink is not offered in that size (and soft drinks don't have colors).

In this new scheme, you only have one table to query to find the correct orderable product. With proper indexes, it should be blazing fast.

Your Question

You asked:

in PostgreSQL, so do you think if i use indexes on unique_id i will get a satisfactory performance?

Any column or set of columns that you use to repeatedly look up data must have an index! Any other pattern will result in a full table scan each time, which will be awful performance. I am sure that these indexes will make your queries lightning fast as it will take only one leaf-level read per table.

like image 20
ErikE Avatar answered Oct 03 '22 09:10

ErikE