To preface, I'm not very experienced with database design. I have a table of hashes and ids. When a group of new hashes are added, each row in the group gets the same id. If any hash within the new group already exists in the database, all hashes in the new group and existing group(s) get a new, shared id (effectively merging ids when hashes are repeated):
INSERT INTO hashes
(id, hash)
VALUES
($new_id, ...), ($new_id, ...)
ON DUPLICATE KEY UPDATE
repeat_count = repeat_count + 1;
INSERT INTO hashes_lookup SELECT DISTINCT id FROM hashes WHERE hash IN (...);
UPDATE hashes JOIN hashes_lookup USING (id) SET id = '$new_id';
TRUNCATE TABLE hashes_lookup;
Other tables reference these ids, so that if an id changes, foreign key constraints take care of updating the ids across tables. The issue here, however, is that I can't enforce uniqueness across any of the child tables. If I do, my queries fail with:
Foreign key constraint for table '...', record '...' would lead to a duplicate entry in table '...'
This error makes sense, given the following test case where id
and value
are a composite unique key:
id | value
---+-------
a | 1
b | 2
c | 1
Then a
gets changed to c
:
id | value
---+-------
c | 1
b | 2
c | 1
But c,1
already exists.
It would be ideal if there was an ON UPDATE IGNORE CASCADE
option, so that if a duplicate row exists, any duplicating inserts are ignored. However, I'm pretty sure the real issue here is my database design, so I am open to any and all suggestions. My current solution is to not enforce uniqueness across child tables, which leads to a lot of redundant rows.
Edit:
CREATE TABLE `hashes` (
`hash` char(64) NOT NULL,
`id` varchar(128) NOT NULL,
`repeat_count` int(11) NOT NULL DEFAULT '0',
`insert_timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`update_timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
UNIQUE KEY `hash` (`hash`) USING BTREE,
KEY `id` (`id`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `emails` (
`id` varchar(128) NOT NULL,
`group_id` char(5) NOT NULL,
`email` varchar(500) NOT NULL,
KEY `index` (`id`) USING BTREE,
UNIQUE KEY `id` (`id`,`group_id`,`email`(255)) USING BTREE,
CONSTRAINT `emails_ibfk_1` FOREIGN KEY (`id`) REFERENCES `hashes` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1
I think will be good to create table hash_group
to store id of hash group:
CREATE TABLE `hash_group` (
`id` BIGINT AUTO_INCREMENT NOT NULL,
`group_name` varchar(128) NOT NULL,
`insert_timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`update_timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
UNIQUE KEY `group_name` (`group_name`) USING BTREE,
PRIMARY KEY (`id`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
And change structure of existing tables:
CREATE TABLE `hashes` (
`hash` char(64) NOT NULL,
`hash_group_id` BIGINT NOT NULL,
`repeat_count` int(11) NOT NULL DEFAULT '0',
`insert_timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`update_timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
UNIQUE KEY `hash` (`hash`) USING BTREE,
KEY `hashes_hash_group_id_index` (`hash_group_id`) USING BTREE,
CONSTRAINT `hashes_hash_group_id_fk` FOREIGN KEY (`hash_group_id`) REFERENCES `hash_group` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `emails` (
`hash_group_id` BIGINT NOT NULL,
`group_id` char(5) NOT NULL,
`email` varchar(500) NOT NULL,
KEY `emails_hash_group_id_index` (`hash_group_id`) USING BTREE,
UNIQUE KEY `emails_unique` (`hash_group_id`,`group_id`,`email`(255)) USING BTREE,
CONSTRAINT `emails_ibfk_1` FOREIGN KEY (`hash_group_id`) REFERENCES `hash_group` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Also create trigger to update hash group if you need to do it:
DELIMITER $$
CREATE TRIGGER `update_hash_group_name` AFTER UPDATE ON `hashes`
FOR EACH ROW
BEGIN
UPDATE `hash_group`
SET `group_name` = md5(now()) -- replace to you hash formula
WHERE id = NEW.hash_group_id;
END;$$
DELIMITER ;
And create function for getting actual group id:
DROP FUNCTION IF EXISTS get_hash_group;
DELIMITER $$
CREATE FUNCTION get_hash_group(id INT) RETURNS INT
BEGIN
IF (id IS NULL) THEN
INSERT INTO `hash_group` (`group_name`)
VALUES (md5(now())); -- replace to you hash
RETURN LAST_INSERT_ID();
END IF;
RETURN id;
END;$$
DELIMITER ;
Scenario:
Initial fill:
INSERT INTO `hash_group` (id, group_name) VALUES
(1, 'test1'),
(2, 'test2'),
(3, 'test3');
INSERT INTO `hashes` (hash, hash_group_id) VALUES
('hash11', 1),
('hash12', 1),
('hash13', 1),
('hash2', 2),
('hash3', 3);
INSERT INTO `emails` (hash_group_id, group_id, email)
VALUES
(1, 'g1', 'example1@'),
(2, 'g1', 'example2@'),
(3, 'g1', 'example2@');
Updating of hash_group
scenario:
START TRANSACTION;
-- Get @min_group_id - minimum group id (we will leave this id and delete other)
SELECT MIN(hash_group_id) INTO @min_group_id
FROM hashes
WHERE hash IN ('hash11', 'hash12', 'hash2', 'hash15');
-- Replace other group ids in email table to @min_group_id
UPDATE `emails`
SET `hash_group_id` = @min_group_id
WHERE `hash_group_id` IN (
SELECT hash_group_id
FROM hashes
WHERE @min_group_id IS NOT NULL
AND hash IN ('hash11', 'hash12', 'hash2', 'hash15')
-- Update only if we are gluy several hash_groups
AND `hash_group_id` > @min_group_id
);
-- Delete other hash_groups and leave only group with @min_group_id
DELETE FROM `hash_group` WHERE `id` IN (
SELECT hash_group_id
FROM hashes
WHERE @min_group_id IS NOT NULL
AND hash IN ('hash11', 'hash12', 'hash2', 'hash15')
-- Delete only if we are gluy several hash_groups
AND `hash_group_id` > @min_group_id
);
-- @group_id = existing hash_group.id or create new if @min_group_id is null (all inserted hashes are new)
SELECT get_hash_group(@min_group_id) INTO @group_id;
-- Now we can insert new hashes.
INSERT INTO `hashes` (hash, hash_group_id) VALUES
('hash11', @group_id),
('hash12', @group_id),
('hash2', @group_id),
('hash15', @group_id)
ON DUPLICATE KEY
UPDATE repeat_count = repeat_count + 1;
COMMIT;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With