Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can a table violate it's own primary key index?

I have a PostgreSQL database which has a table with the primary key applied to three columns. According to the database, there is an index on the key:

Indexes:
    "full_log_pkey" PRIMARY KEY, btree (server_name, line_number, log_generation)

Yet some simple tests reveal that I have duplicate keys:

select count(*) from full_log;
  count
----------
 60644405

select count(*) from 
    (select distinct server_name, 
                     line_number, 
                     log_generation 
            from     full_log) as foo;
  count
----------
 60636564

Clearly there are fewer distinct rows (based on the primary key) than there are rows. My question is, how is this possible?

Edit: The full table definition is such:

                 Table "public.full_log"
     Column     |            Type             | Modifiers
----------------+-----------------------------+-----------
 activity       | character(1)                |
 archivaldate   | timestamp without time zone |
 media_type     | character varying(5)        |
 vsn            | text                        |
 archive_set    | character varying(20)       |
 copy           | smallint                    |
 file_start     | integer                     |
 file_offset    | integer                     |
 fs_name        | character varying(20)       |
 inode          | double precision            |
 file_length    | bigint                      |
 file_type      | character(1)                |
 overflow       | integer                     |
 device_number  | integer                     |
 server_name    | text                        | not null
 path           | text                        |
 line_number    | integer                     | not null
 log_generation | integer                     | not null
Indexes:
    "full_log_pkey" PRIMARY KEY, btree (server_name, line_number, log_generation)
Foreign-key constraints:
    "full_log_server_name_fkey" FOREIGN KEY (server_name) REFERENCES servers(server_name)
Rules:
    insert_update_full_log AS
    ON INSERT TO full_log
   WHERE (EXISTS ( SELECT full_log.activity, full_log.archivaldate, full_log.media_type, full_log.vsn, full_log.archive_set, full_log.copy, full_log.file_start, full_log.file_offset, full_log.fs_name, full_log.inode, full_log.file_length, full_log.file_type, full_log.overflow, full_log.device_number, full_log.server_name, full_log.path, full_log.line_number, full_log.log_generation
           FROM full_log
          WHERE full_log.server_name = new.server_name AND full_log.line_number = new.line_number AND full_log.log_generation = new.log_generation)) DO INSTEAD  UPDATE full_log SET activity = new.activity, archivaldate = new.archivaldate, media_type = new.media_type, vsn = new.vsn, archive_set = new.archive_set, copy = new.copy, file_start = new.file_start, file_offset = new.file_offset, fs_name = new.fs_name, inode = new.inode, file_length = new.file_length, file_type = new.file_type, overflow = new.overflow, device_number = new.device_number, path = new.path
  WHERE full_log.server_name = new.server_name AND full_log.line_number = new.line_number AND full_log.log_generation = new.log_generation

For an example of duplicate rows:

 select * from full_log where line_number = 6332986;
 activity |    archivaldate     | media_type |  vsn   | archive_set | copy | file_start | file_offset | fs_name |   inode    | file_length | file_type | overflow | device_number | server_name |                                           path                                            | line_number | log_generation
----------+---------------------+------------+--------+-------------+------+------------+-------------+---------+------------+-------------+-----------+----------+---------------+-------------+-------------------------------------------------------------------------------------------+-------------+----------------
 A        | 2010-10-13 10:49:49 | ti         | Z00711 | lcbp_rel    |    1 |     226237 |      779099 | lcbp    | 21798068.3 |    31198108 | f         |        0 |          8511 | redact      | wdl/delivery/irishparis_2010_09/MSE2_Histoire des rois d'Angleterre/MSE2_239.TIF          |     6332986 |              1
 A        | 2010-10-13 10:49:49 | ti         | Z00711 | lcbp_rel    |    1 |     226237 |      779099 | lcbp    | 21798068.3 |    31198108 | f         |        0 |          8511 | redact      | wdl/delivery/irishparis_2010_09/MSE2_Histoire des rois d'Angleterre/MSE2_239.TIF          |     6332986 |              1
(2 rows)
like image 884
astine Avatar asked Apr 22 '11 20:04

astine


2 Answers

What does this query return?

select server_name, line_number, log_generation 
from full_log
group by server_name, line_number, log_generation
having count(*) > 1

It might help to compare that to

select line_number, log_generation 
from full_log
group by line_number, log_generation
having count(*) > 1

but it might not. I think this clause

WHERE (EXISTS ( SELECT full_log.activity, 
                       full_log.archivaldate, 
                       full_log.media_type, 
                       full_log.vsn, 
                       full_log.archive_set, 
                       full_log.copy, 
                       full_log.file_start, 
                       full_log.file_offset, 
                       full_log.fs_name, 
                       full_log.inode, 
                       full_log.file_length, 
                       full_log.file_type, 
                       full_log.overflow, 
                       full_log.device_number, 
                       full_log.server_name, 
                       full_log.path, 
                       full_log.line_number, 
                       full_log.log_generation
               FROM full_log
               WHERE full_log.server_name = new.server_name 
                 AND full_log.line_number = new.line_number 
                 AND full_log.log_generation = new.log_generation)) 

can be simplified to this clause. (Although I don't think this contributes to the problem.)

WHERE (EXISTS ( SELECT full_log.server_name, 
                       full_log.line_number, 
                       full_log.log_generation
                FROM full_log
                WHERE full_log.server_name = new.server_name 
                  AND full_log.line_number = new.line_number 
                  AND full_log.log_generation = new.log_generation)) 

You said PostgreSQL drops and recreates the index when when you alter the data type of a non-key column. I don't see that happening here, and I'm not sure I've ever seen that happen. I might not have noticed if the change succeeded, and I don't routinely change the data type of a column. (Now that I've said that, I couldn't begin to tell you the last time I did that.) I now have PostgreSQL 9.0.2 here.

like image 125
Mike Sherrill 'Cat Recall' Avatar answered Oct 08 '22 01:10

Mike Sherrill 'Cat Recall'


Are you using table inheritance? If so, the PK is not enforced across children (at least, not in 8.2).

like image 43
Andrew Lazarus Avatar answered Oct 07 '22 23:10

Andrew Lazarus