I have a table which will potentially store hundreds of thousands of integers:
desc id_key_table;
+----------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+-------+
| id_key | int(16) | NO | PRI | NULL | |
+----------------+--------------+------+-----+---------+-------+
From a program, I have a large set of integers. I'd like to see which of these integers are NOT in the above id_key column.
So far I've come up with the following approaches:
1) Iterate through each integer and perform a:
select count(*) count from id_key_table where id_key = :id_key
When count is 0 the id_key is missing from the table.
This seems like a horrible, horrible way to do it.
2) Create a temporary table, insert each of the values into the temporary table, and perform a JOIN on the two tables.
create temporary table id_key_table_temp (id_key int(16) primary key );
insert into id_key_table_temp values (1),(2),(3),...,(500),(501);
select temp.id_key
from id_key_table_temp temp left join id_key_table as main
on temp.id_key = main.id_key
where main.killID is null;
drop table id_key_table_temp;
This seems like the best approach, however, I'm sure there is a far better approach I haven't thought of yet. I'd prefer to not have to create a temporary table and use one query to determine which integers are missing.
Is there a proper query for this type of search?
(MySQL)
Using your code in the second example given in the question, I created two stored procedures (SP): 1 SP to load a sample table of prime numbers as keys, the other SP to find the missing integers.
Here is the first SP:
DELIMITER $$
DROP PROCEDURE IF EXISTS `test`.`CreateSampleTable` $$
CREATE PROCEDURE `test`.`CreateSampleTable` (maxinttoload INT)
BEGIN
DECLARE X,OKTOUSE,MAXLOOP INT;
DROP TABLE IF EXISTS test.id_key_table;
CREATE TABLE test.id_key_table (id_key INT(16)) ENGINE=MyISAM;
SET X=2;
WHILE X <= maxinttoload DO
INSERT INTO test.id_key_table VALUES (X);
SET X = X + 1;
END WHILE;
ALTER TABLE test.id_key_table ADD PRIMARY KEY (id_key);
SET MAXLOOP = FLOOR(SQRT(maxinttoload));
SET X = 2;
WHILE X <= MAXLOOP DO
DELETE FROM test.id_key_table WHERE MOD(id_key,X) = 0 AND id_key > X;
SELECT MIN(id_key) INTO OKTOUSE FROM test.id_key_table WHERE id_key > X;
SET X = OKTOUSE;
END WHILE;
OPTIMIZE TABLE test.id_key_table;
SELECT * FROM test.id_key_table;
END $$
DELIMITER ;
Here is the second SP:
DELIMITER $$
DROP PROCEDURE IF EXISTS `test`.`GetMissingIntegers` $$
CREATE PROCEDURE `test`.`GetMissingIntegers` (maxinttoload INT)
BEGIN
DECLARE X INT;
DROP TABLE IF EXISTS test.id_key_table_temp;
CREATE TEMPORARY TABLE test.id_key_table_temp (id_key INT(16)) ENGINE=MyISAM;
SET X=1;
WHILE X <= maxinttoload DO
INSERT INTO test.id_key_table_temp VALUES (X);
SET X = X + 1;
END WHILE;
ALTER TABLE test.id_key_table_temp ADD PRIMARY KEY (id_key);
SELECT temp.id_key FROM test.id_key_table_temp temp
LEFT JOIN test.id_key_table main USING (id_key)
WHERE main.id_key IS NULL;
END $$
DELIMITER ;
Here is the Sample Run of First SP using the number 25 to create prime numbers:
mysql> CALL test.CreateSampleTable(25);
+-------------------+----------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+-------------------+----------+----------+----------+
| test.id_key_table | optimize | status | OK |
+-------------------+----------+----------+----------+
1 row in set (0.16 sec)
+--------+
| id_key |
+--------+
| 2 |
| 3 |
| 5 |
| 7 |
| 11 |
| 13 |
| 17 |
| 19 |
| 23 |
+--------+
9 rows in set (0.17 sec)
mysql>
Here is the run of the second SP using 25 as the full list to compare:
mysql> CALL test.GetMissingIntegers(25);
+--------+
| id_key |
+--------+
| 1 |
| 4 |
| 6 |
| 8 |
| 9 |
| 10 |
| 12 |
| 14 |
| 15 |
| 16 |
| 18 |
| 20 |
| 21 |
| 22 |
| 24 |
| 25 |
+--------+
16 rows in set (0.03 sec)
Query OK, 0 rows affected (0.05 sec)
mysql>
While this solution is OK for small samples, big lists become a major headache. You may want to keep the temp table (don't use CREATE TEMPORARY TABLE over and over, use CREATE TABLE just once) permamnently loaded with the numbers 1 .. MAX(id_key) and populate that permanent temp table via a trigger on id_key_table.
Just a question because I am curious: Are you doing this to see if auto_increment keys from a table can be reused ???
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With