Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating unique codes that are different in two digits

I want to generate unique code numbers (composed of 7 digits exactly). The code number is generated randomly and saved in MySQL table.

I have another requirement. All generated codes should differ in at least two digits. This is useful to prevent errors while typing the user code. Hopefully, it will prevent referring to another user code while doing some operations as it is more unlikely to miss two digits and match another existing user code.

The generate algorithm works simply like:

  1. Retrieve all previous codes if any from MySQL table.
  2. Generate one code at a time.
  3. Subtract the generated code with all previous codes.
  4. Check the number of non-zero digits in the subtraction result.
  5. If it is > 1, accept the generated code and add it to previous codes.
  6. Otherwise, jump to 2.
  7. Repeat steps from 2 to 6 for the number of requested codes.
  8. Save the generated codes in the DB table.

The algorithm works fine, but the problem is related to performance. It takes a very long to finish generating the codes when requesting to generate a large number of codes like: 10,000.

The question: Is there any way to improve the performance of this algorithm?

I am using perl + MySQL on Ubuntu server if that matters.

like image 842
Khaled Avatar asked Mar 21 '11 15:03

Khaled


3 Answers

Have you considered a variant of the Luhn algorithm? Luhn is used to generate a check digit for strings of numbers in lots of applications, including credit card account numbers. It's part of the ISO-7812-1 standard for generating identifiers. It will catch any number that is entered with one incorrect digit, which implies any two valid numbers differ in a least two digits.

Check out Algorithm::LUHN in CPAN for a perl implementation.

like image 176
ataylor Avatar answered Oct 06 '22 21:10

ataylor


Don't retrieve the existing codes, just generate a potential new code and see if there are any conflicting ones in the database:

SELECT code FROM table WHERE abs(code-?) regexp '^[1-9]?0*$';

(where the placeholder is the newly generated code).

Ah, I missed the generating lots of codes at once part. Do it like this (completely untested):

my @codes = existing_codes();

my $frontwards_index = {};
my $backwards_index = {};
for my $code (@codes) {
    index_code($code, $frontwards_index);
    index_code(reverse($code), $backwards_index);
}

my @new_codes = map generate_code($frontwards_index, $backwards_index), 1..10000;

sub index_code {
    my ($code, $index) = @_;
    push @{ $index{ substr($code, 0, length($code)/2) } }, $code;
    return;
}

sub check_index {
    my ($code, $index) = @_;
    my $found = grep { ($_ ^ $code) =~ y/\0//c <= 1 } @{ $index{ substr($code, 0, length($code)/2 } };
    return $found;
}

sub generate_code {
    my ($frontwards_index, $backwards_index) = @_;

    my $new_code;
    do {
        $new_code = sprintf("%07d", rand(10000000));
    } while check_index($new_code, $frontwards_index)
        || check_index(reverse($new_code), $backwards_index);
    index_code($new_code, $frontwards_index);
    index_code(reverse($new_code), $backwards_index);
    return $new_code;
}
like image 26
ysth Avatar answered Oct 06 '22 21:10

ysth


Put the numbers 0 through 9,999,999 in an augmented binary search tree. The augmentation is to keep track of the number of sub-nodes to the left and to the right. So for example when your algorithm begins, the top node should have value 5,000,000, and it should know that it has 5,000,000 nodes to the left, and 4,999,999 nodes to the right. Now create a hashtable. For each value you've used already, remove its node from the augmented binary search tree and add the value to the hashtable. Make sure to maintain the augmentation.

To get a single value, follow these steps.

  1. Use the top node to determine how many nodes are left in the tree. Let's say you have n nodes left. Pick a random number between 0 and n. Using the augmentation, you can find the nth node in your tree in log(n) time.
  2. Once you've found that node, compute all the values that would make the value at that node invalid. Let's say your node has value 1,111,111. If you already have 2,111,111 or 3,111,111 or... then you can't use 1,111,111. Since there are 8 other options per digit and 7 digits, you only need to check 56 possible values. Check to see if any of those values are in your hashtable. If you haven't used any of those values yet, you can use your random node. If you have used any of them, then you can't.
  3. Remove your node from the augmented tree. Make sure that you maintain the augmented information.
  4. If you can't use that value, return to step 1.
  5. If you can use that value, you have a new random code. Add it to the hashtable.

Now, checking to see if a value is available takes O(1) time instead of O(n) time. Also, finding another available random value to check takes O(log n) time instead of... ah... I'm not sure how to analyze your algorithm.

Long story short, if you start from scratch and use this algorithm, you will generate a complete list of valid codes in O(n log n). Since n is 10,000,000, it will take a few seconds or something.

Did I do the math right there everybody? Let me know if that doesn't check out or if I need to clarify anything.

like image 2
Dave Aaron Smith Avatar answered Oct 06 '22 21:10

Dave Aaron Smith