I have a PostgreSQL database which has a table with the primary key applied to three columns. According to the database, there is an index on the key:
Indexes:
"full_log_pkey" PRIMARY KEY, btree (server_name, line_number, log_generation)
Yet some simple tests reveal that I have duplicate keys:
select count(*) from full_log;
count
----------
60644405
select count(*) from
(select distinct server_name,
line_number,
log_generation
from full_log) as foo;
count
----------
60636564
Clearly there are fewer distinct rows (based on the primary key) than there are rows. My question is, how is this possible?
Edit: The full table definition is such:
Table "public.full_log"
Column | Type | Modifiers
----------------+-----------------------------+-----------
activity | character(1) |
archivaldate | timestamp without time zone |
media_type | character varying(5) |
vsn | text |
archive_set | character varying(20) |
copy | smallint |
file_start | integer |
file_offset | integer |
fs_name | character varying(20) |
inode | double precision |
file_length | bigint |
file_type | character(1) |
overflow | integer |
device_number | integer |
server_name | text | not null
path | text |
line_number | integer | not null
log_generation | integer | not null
Indexes:
"full_log_pkey" PRIMARY KEY, btree (server_name, line_number, log_generation)
Foreign-key constraints:
"full_log_server_name_fkey" FOREIGN KEY (server_name) REFERENCES servers(server_name)
Rules:
insert_update_full_log AS
ON INSERT TO full_log
WHERE (EXISTS ( SELECT full_log.activity, full_log.archivaldate, full_log.media_type, full_log.vsn, full_log.archive_set, full_log.copy, full_log.file_start, full_log.file_offset, full_log.fs_name, full_log.inode, full_log.file_length, full_log.file_type, full_log.overflow, full_log.device_number, full_log.server_name, full_log.path, full_log.line_number, full_log.log_generation
FROM full_log
WHERE full_log.server_name = new.server_name AND full_log.line_number = new.line_number AND full_log.log_generation = new.log_generation)) DO INSTEAD UPDATE full_log SET activity = new.activity, archivaldate = new.archivaldate, media_type = new.media_type, vsn = new.vsn, archive_set = new.archive_set, copy = new.copy, file_start = new.file_start, file_offset = new.file_offset, fs_name = new.fs_name, inode = new.inode, file_length = new.file_length, file_type = new.file_type, overflow = new.overflow, device_number = new.device_number, path = new.path
WHERE full_log.server_name = new.server_name AND full_log.line_number = new.line_number AND full_log.log_generation = new.log_generation
For an example of duplicate rows:
select * from full_log where line_number = 6332986;
activity | archivaldate | media_type | vsn | archive_set | copy | file_start | file_offset | fs_name | inode | file_length | file_type | overflow | device_number | server_name | path | line_number | log_generation
----------+---------------------+------------+--------+-------------+------+------------+-------------+---------+------------+-------------+-----------+----------+---------------+-------------+-------------------------------------------------------------------------------------------+-------------+----------------
A | 2010-10-13 10:49:49 | ti | Z00711 | lcbp_rel | 1 | 226237 | 779099 | lcbp | 21798068.3 | 31198108 | f | 0 | 8511 | redact | wdl/delivery/irishparis_2010_09/MSE2_Histoire des rois d'Angleterre/MSE2_239.TIF | 6332986 | 1
A | 2010-10-13 10:49:49 | ti | Z00711 | lcbp_rel | 1 | 226237 | 779099 | lcbp | 21798068.3 | 31198108 | f | 0 | 8511 | redact | wdl/delivery/irishparis_2010_09/MSE2_Histoire des rois d'Angleterre/MSE2_239.TIF | 6332986 | 1
(2 rows)
What does this query return?
select server_name, line_number, log_generation
from full_log
group by server_name, line_number, log_generation
having count(*) > 1
It might help to compare that to
select line_number, log_generation
from full_log
group by line_number, log_generation
having count(*) > 1
but it might not. I think this clause
WHERE (EXISTS ( SELECT full_log.activity,
full_log.archivaldate,
full_log.media_type,
full_log.vsn,
full_log.archive_set,
full_log.copy,
full_log.file_start,
full_log.file_offset,
full_log.fs_name,
full_log.inode,
full_log.file_length,
full_log.file_type,
full_log.overflow,
full_log.device_number,
full_log.server_name,
full_log.path,
full_log.line_number,
full_log.log_generation
FROM full_log
WHERE full_log.server_name = new.server_name
AND full_log.line_number = new.line_number
AND full_log.log_generation = new.log_generation))
can be simplified to this clause. (Although I don't think this contributes to the problem.)
WHERE (EXISTS ( SELECT full_log.server_name,
full_log.line_number,
full_log.log_generation
FROM full_log
WHERE full_log.server_name = new.server_name
AND full_log.line_number = new.line_number
AND full_log.log_generation = new.log_generation))
You said PostgreSQL drops and recreates the index when when you alter the data type of a non-key column. I don't see that happening here, and I'm not sure I've ever seen that happen. I might not have noticed if the change succeeded, and I don't routinely change the data type of a column. (Now that I've said that, I couldn't begin to tell you the last time I did that.) I now have PostgreSQL 9.0.2 here.
Are you using table inheritance? If so, the PK is not enforced across children (at least, not in 8.2).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With