Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL - inserting 70000 random unique strings efficiently

Tags:

php

random

mysql

I'm working on a project, in which I should generate at least 70000 codes which contain 8 alphanumeric characters. The codes must be unique. currently I am using php to generate these codes with the following function :

function random_unique_serial($length, PDO $conn) {
    $codeCheck=FALSE;
    while (!$codeCheck) {
        $characters = '0123456789abcdefghijklmnopqrstuvwxyz';
        $charactersLength = strlen($characters);
        $randomCode = '';
        for ($i = 0; $i < $length; $i++) {
            $randomCode .= $characters[rand(0, $charactersLength - 1)];
        }
        $sql = "SELECT * FROM codes WHERE code=:code";
        $st = $conn->prepare($sql);
        $st->bindvalue(":code", $randomCode, PDO::PARAM_STR);
        $st->execute();
        $count = $st->rowcount();
        if ($count==0) {
            $codeCheck=TRUE;
        } else {
            $codeCheck=FALSE;
        }
    }
    return $randomCode;
}

As you see this codes checks the database for every single code generated to make sure it is not a duplicate. This should work theoretically. However this is very slow and causes the request to time out. I tried increasing execution time but that also didn't help.

Then I decided to use a database side approach and used this solution : Generating a random & unique 8 character string using MySQL

This is also very slow and some of the generated codes are less than 8 characters long.

could you please suggest a better solution?

like image 419
MehdiB Avatar asked Jan 18 '17 11:01

MehdiB


People also ask

How to generate Unique code in MySQL?

Use UUID() to Generate Random and Unique Strings in MySQL Another method of producing the 8-character string in MySQL is LEFT(UUID(),8) , as given above. The RFC 4122 (Universally Unique Identifier URN Namespace) specified the UUID() (Universal Unique Identifier) which is a 128-bit long value.

How can we get a random number between 1 and 100 in MySQL?

select FLOOR( RAND() * (maximumValue-minimumValue) + minimumValue) as anyVariableName; Let us check with some maximum and minimum value. The maximum value we are considering is 200 and minimum is 100. The random number will be between 100 and 200 including 100 and 200 itself.

How do I generate a random character in MySQL?

There's no built-in way to generate a random string in MySQL, so you need to create a workaround using the provided functions. One of the easiest ways to generate a random string is to use a combination of the SUBSTR() function, the MD5() function, and the RAND() function.

How do you generate unique strings in SQL Server?

If you need a string of random digits up to 32 characters for test data or just need some junk text to fill a field, SQL Server's NEWID() function makes this simple. NEWID() is used to create a new GUID (globally unique identifier), and we can use that as a base to get a string of random characters.


2 Answers

Create your table structure:

CREATE TABLE t (code CHAR(8) CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL UNIQUE);

Define a PHP function to generate a random string:

function random_string(integer $length = 8): string {
    return bin2hex(mcrypt_create_iv(ceil($length/2), MCRYPT_DEV_URANDOM));
}

Use PHP to build a multi-value INSERT statement, ram that into the database, count how many were inserted, and repeat until the required number are inserted:

function insert_records(\PDO $pdo, integer $need = 70000): void {
    $have = 0;
    while ($have < $need) {
        // generate multi value INSERT
        $sql = 'INSERT IGNORE INTO t VALUES ';
        for ($i = 1; $i < $need; $i++) {
            $sql .= sprintf('("%s"),', random_string());
        }
        $sql .= sprintf('("%s");', random_string());

        // pass to database and ask how many records were inserted
        $result = $pdo->query($sql);
        $count  = $result->rowCount();

        // adjust bookkeeping values so we know how many we have and how many
        // we need
        $need -= $count;
        $have += $count;
    }
}

On my machine (Amazon Linux c2.small), the run time for 70k records is about 2 seconds:

real    0m2.136s
user    0m1.256s
sys     0m0.212s

The relevant tricks in this code, to make it fast, are:

  • Sending the minimum number of SQL statements necessary to generate the needed number of records. Using a multi-value insert - INSERT INTO ... VALUES (), (), ... (); - really helps this as it minimizes the total amount of statement processing MySQL has to do and it tells us how many records were inserted without having to do another query.
  • Using INSERT IGNORE to avoid having to check for the existence of every single code we insert, which is really really expensive.
  • Using the fastest possible string generating function we can for our needs. In my experience, mcrypt_create_iv is a fast generator that is cryptographically secure, so it provides an ideal balance of security and performance.
  • Using the ASCII character set and fixed width CHAR to remove unnecessary byte overhead and UNIQUE to enforce de-duplication.
like image 200
bishop Avatar answered Oct 22 '22 07:10

bishop


I'd do that with mysql alone, a stored procedure will help - you can still create and call that with php. The stored procedure uses the substring of a md5 hash, created from rand(). The column where the string is to be inserted needs to be unique. Replace table name and column in in this part:

insert ignore into foo (`uniqueString`)

delimiter //
create procedure createRandomString (in num int)
  begin
    declare i int default 0;
    while i < num do
      insert ignore into foo (`uniqueString`) values (substr(md5(rand()), 1, 8));
      set i = i + 1;
    end while;
  end //
delimiter ;

call createRandomString (70000);

I did a quick test, I got 69934 random unique strings inserted on a remote db (from the 70000 runs) within 10s 603ms. Running the same procedure with 80000 as parameter

call createRandomString(80000);

runs 12s 434ms for me, inserting 77354 rows - so you have at least 70000 in little time.

Will produce results like this:

enter image description here


If you want to make sure to have exactly the number of rows inserted as called, use this (but note to set the max_sp_recursion_depth to what it was before after calling the procedure, default is 0):

delimiter //
create procedure createRandomString2 (in num int)
  begin
    declare i int default 0;
    while i < num do
      insert ignore into foo (uniqueString) values (substr(md5(rand()), 1, 8));
      set i = i + 1;
    end while;
    if (select count(id) from foo) < num then
      call createRandomString2(num - (select count(id) from foo));
   END IF;
  end //
delimiter ;

set max_sp_recursion_depth = 100; 
call createRandomString7 (70000);
set max_sp_recursion_depth = 0;
like image 40
baao Avatar answered Oct 22 '22 07:10

baao