Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do I need a primary key for my table, which has a UNIQUE (composite 4-columns), one of which can be NULL?

I have the following table (PostgreSQL 8.3) which stores prices of some products. The prices are synchronised with another database, basically most of the fields below (apart from one) are not updated by our client - but instead dropped and refreshed every once-in-a-while to sync with another stock database:

CREATE TABLE product_pricebands (
    template_sku varchar(20) NOT NULL,
    colourid integer REFERENCES colour (colourid) ON DELETE CASCADE,        
    currencyid integer NOT NULL REFERENCES currency (currencyid) ON DELETE CASCADE,
    siteid integer NOT NULL REFERENCES site (siteid) ON DELETE CASCADE,

    master_price numeric(10,2),

    my_custom_field boolean, 

    UNIQUE (template_sku, siteid, currencyid, colourid)
);

On the synchronisation, I basically DELETE most of the data above except for data WHERE my_custom_field is TRUE (if it's TRUE, it means the client updated this field via their CMS and therefore this record should not be dropped). I then INSERT 100s to 1000s of rows into the table, and UPDATE where the INSERT fails (i.e. where the combination of (template_sku, siteid, currencyid, colourid) already exists).

My question is - what best practice should be applied here to create a primary key? Is a primary key even needed? I wanted to make the primary key = (template_sku, siteid, currencyid, colourid) - but the colourid field can be NULL, and using it in a composite primary key is not possible.

From what I read on other forum posts, I think I have done the above correctly, and just need to clarify:

1) Should I use a "serial" primary key just in case I ever need one? At the moment I don't, and don't think I ever will, because the important data in the table is the price and my custom field, only identified by the (template_sku, siteid, currencyid, colourid) combination.

2) Since (template_sku, siteid, currencyid, colourid) is the combination that I will use to query a product's price, should I add any further indexing to my columns, such as the "template_sku" which is a varchar? Or is the UNIQUE constraint a good index already for my SELECTs?

like image 870
rishijd Avatar asked May 09 '12 12:05

rishijd


People also ask

Does a composite key have to be a primary key?

Composite Key Declaration When over one column or field in a table are combined to achieve the task of uniquely identifying row values, then that composite key can be either a primary or a candidate key of that table.

Can composite primary key contain NULL?

Hi, In composite primary key columns you cannot pass null values. Each column defined as a primary key would be validated so that null values are not passed on to them. If you have given a Unique constraint then we have a chance of NULL values being accepted.

Is it necessary for a table to have a primary key?

Every table can have (but does not have to have) a primary key. The column or columns defined as the primary key ensure uniqueness in the table; no two rows can have the same key. The primary key of one table may also help to identify records in other tables, and be part of the second table's primary key.

Can primary key be based on multiple columns?

The PRIMARY KEY constraint uniquely identifies each record in a table. Primary keys must contain UNIQUE values, and cannot contain NULL values. A table can have only ONE primary key; and in the table, this primary key can consist of single or multiple columns (fields).


1 Answers

Should I use a "serial" primary key just in case I ever need one?

You can easily add a serial column later if you need one:

ALTER TABLE product_pricebands ADD COLUMN id serial;

The column will be filled with unique values automatically. You can even make it the primary key in the same statement (if no primary key is defined, yet):

ALTER TABLE product_pricebands ADD COLUMN id serial PRIMARY KEY;

If you reference the table from other tables I would advise to use such a surrogate primary key, because it is rather unwieldy to link by four columns. It is also slower in SELECTs with JOINs.

Either way, you should define a primary key. The UNIQUE index including a nullable column is not a full replacement. It allows duplicates for combinations including a NULL value, because two NULL values are never considered the same. This can lead to trouble.


As

the colourid field can be NULL

you might want to create two unique indexes. The combination (template_sku, siteid, currencyid, colourid) cannot be a PRIMARY KEY, because of the nullable colourid, but you can create a UNIQUE constraint like you already have (implementing an index automatically):

ALTER TABLE product_pricebands ADD CONSTRAINT product_pricebands_uni_idx
UNIQUE (template_sku, siteid, currencyid, colourid)

This index perfectly covers the queries you mention in 2).
Create a partial unique index in addition if you want to avoid "duplicates" with (colourid IS NULL):

CREATE UNIQUE INDEX product_pricebands_uni_null_idx
ON product_pricebands (template_sku, siteid, currencyid)
WHERE colourid IS NULL;

To cover all bases. I wrote more about that technique in a related answer on dba.SE.


The simple alternative to the above is to make colourid NOT NULL and create a primary key instead of the above product_pricebands_uni_idx.


Also, as you

basically DELETE most of the data

for your refill operation, it will be faster to drop indexes, that are not needed during the refill operation, and recreate those afterwards. It is faster by an order of magnitude to build an index from scratch than to add all rows incrementally.

How do you know, which indexes are used (needed)?

  • Test your queries with EXPLAIN ANALYZE.
  • Or use the built-in statistics. pgAdmin displays statistics in a separate tab for the selected object.

It may also be faster to select the few rows with my_custom_field = TRUE into a temporary table, TRUNCATE the base table and re-INSERT the survivors. Depends on whether you have foreign keys defined. Would look like this:

CREATE TEMP TABLE pr_tmp AS
SELECT * FROM product_pricebands WHERE my_custom_field;

TRUNCATE product_pricebands;
INSERT INTO product_pricebands SELECT * FROM pr_tmp;

This avoids a lot of vacuuming.

like image 120
Erwin Brandstetter Avatar answered Oct 29 '22 11:10

Erwin Brandstetter