Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Checking and preventing similar strings while insertion in MySQL

Brief info

I have 3 tables:

Set:

id
name

SetItem:

set_id
item_id
position

TempSet:

id

I have a function that generates new random combinations from Item table. Basically, always after successful generation, I create a new row in Set table, get it's id and add all item ids into SetItem table.

Problem

Every time before generating new combination I truncate the TempSet table, fill new item ids into this table and check for similarity percentage by comparing with previous combinations in SetItem table. if new combination similarity greater or equal to 30%, I need to prevent this combination and re-generate new combination.

Similarity means - existence of elements on previously generated combinations. So, the idea is:

if more than 3 element of newly generated set repeated on some previously generated set, prevent it and try to generate another combination.

Here is function that generates new combinations:

  CREATE DEFINER = `root` @`localhost` FUNCTION `gen_uniq_perm_by_kw` (
    comboSize INT ( 5 ),
    tries INT ( 3 ) 
    ) RETURNS text CHARSET utf8 SQL SECURITY INVOKER BEGIN
    iterat :
    LOOP
        DELETE 
        FROM
            `TempSet`;
        INSERT INTO `TempSet` ( `id` ) (
            SELECT
                `i`.`id` 
            FROM
                `Item` AS `i`
            ORDER BY
                RAND( ) 
                LIMIT comboSize 
            );
        IF
            (
            SELECT
                1 
            FROM
                `SetItem` 
            GROUP BY
                `set_id` 
            HAVING
                sum(
                CASE
                        
                        WHEN EXISTS (
                        SELECT
                            id 
                        FROM
                            `TempSet` 
                        WHERE
                            `id` = `item_id` 
                            LIMIT 1 
                            ) THEN
                            1 ELSE 0 
                        END 
                        ) / count( 1 ) * 100 >= 30 
                        LIMIT 1 
                        ) < 1 THEN
                        RETURN ( SELECT GROUP_CONCAT( id SEPARATOR '-' ) FROM `TempSet` );
                    
                END IF;
                
                SET tries := tries - 1;
                IF
                    tries = 0 THEN
                        RETURN NULL;
                    
                END IF;
                
            END LOOP iterat;
        
END

When I test it, even when newly generated combination's elements doesn't exist in any other previously generated combination, it returns null as a result.

My question is, what am I doing wrong?

like image 961
Ilaroot Veyila Avatar asked Jul 12 '17 12:07

Ilaroot Veyila


People also ask

How do I match a string in MySQL?

STRCMP() function in MySQL is used to compare two strings. If both of the strings are same then it returns 0, if the first argument is smaller than the second according to the defined order it returns -1 and it returns 1 when the second one is smaller the first one.

How check string is value or not in MySQL?

MySQL doesn't have a built-in function to check if a string value is a valid number or not. To determine if a string is numeric, you need to write your own solution. One way to check if a string is numeric is by writing a regular expression using the REGEXP operator.

What is Find_in_set in MySQL?

The FIND_IN_SET() function returns the position of a string within a list of strings.

How can I insert more than 1000 rows in MySQL?

How can insert 1000 records at a time in MySQL? MySQL INSERT multiple rows statement In this syntax: First, specify the name of table that you want to insert after the INSERT INTO keywords. Second, specify a comma-separated column list inside parentheses after the table name.


1 Answers

My question is, what am I doing wrong?

You don't have any data in your SetItem table.

Edit: You commented that this is wrong; you do have 300k rows in SetItem.


I got an example working. It appears that you can't use a scalar subquery like you're doing. I got it working this way:

DROP FUNCTION IF EXISTS gen_uniq_perm_by_kw;
DELIMITER ;;
CREATE DEFINER = `root` @`localhost` FUNCTION `gen_uniq_perm_by_kw` (comboSize INT, tries INT) RETURNS text CHARSET utf8 SQL SECURITY INVOKER
BEGIN
        iterat :
        LOOP
                DELETE FROM `TempSet`;

                INSERT INTO `TempSet` (`id`)
                SELECT `i`.`id` FROM `Item` AS `i` ORDER BY RAND() LIMIT comboSize;

                IF EXISTS(
                        SELECT set_id,
                                SUM(CASE WHEN EXISTS (SELECT id FROM `TempSet` WHERE `id` = `item_id` LIMIT 1) THEN 1 ELSE 0 END) AS group_sum,
                                COUNT(*) AS group_count
                        FROM `SetItem`
                        GROUP BY `set_id`
                        HAVING group_sum * 10 / group_count < 3
                ) THEN
                        RETURN (SELECT GROUP_CONCAT(id SEPARATOR '-') FROM `TempSet`);
                END IF;

                SET tries = tries - 1;

                IF tries = 0 THEN
                        RETURN NULL;
                END IF;
        END LOOP iterat;
END

I also got it working in a simpler way, without using the SUM and extra subquery:

DROP FUNCTION IF EXISTS gen_uniq_perm_by_kw;
DELIMITER ;;
CREATE DEFINER = `root` @`localhost` FUNCTION `gen_uniq_perm_by_kw` (comboSize INT, tries INT) RETURNS text CHARSET utf8 SQL SECURITY INVOKER
BEGIN
        iterat :
        LOOP
                DELETE FROM `TempSet`;

                INSERT INTO `TempSet` (`id`)
                SELECT `i`.`id` FROM `Item` AS `i` ORDER BY RAND() LIMIT comboSize;

                IF EXISTS(
                        SELECT s.set_id,
                                COUNT(t.id) AS group_matches,
                                COUNT(*) AS group_count
                        FROM SetItem AS s LEFT OUTER JOIN TempSet AS t ON t.id = s.item_id
                        GROUP BY s.set_id
                        HAVING group_matches * 10 / group_count < 3
                ) THEN
                        RETURN (SELECT GROUP_CONCAT(id SEPARATOR '-') FROM `TempSet`);
                END IF;

                SET tries = tries - 1;

                IF tries = 0 THEN
                        RETURN NULL;
                END IF;
        END LOOP iterat;
END
like image 120
Bill Karwin Avatar answered Sep 30 '22 07:09

Bill Karwin