I have a database where i am storing more than 1000000 names in mysql. Now the task of my application is a bit typical. I not only searches for names in the database,but also finds similar names. Suppose the name is entered as <code>christian</code>, then the application will show suggested names like <code>christine</code>, <code>chris</code> etc. What is the optimal way to do this, without using the <code>like</code> clause. The suggestions will be only on the changes in the last part of the name.

If you want also similar names (by sound) something like <code>SOUNDEX()</code> could help: http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_soundex Otherwise <code>… LIKE 'chri%'</code> seems for me not a bad idea? If you really want just the first characters without <code>LIKE</code> you can use <code>SUBSTRING()</code>.

Optimal way to find similar value from a large table

Tags:

sql

mysql

I have a database where i am storing more than 1000000 names in mysql. Now the task of my application is a bit typical. I not only searches for names in the database,but also finds similar names. Suppose the name is entered as christian, then the application will show suggested names like christine, chris etc. What is the optimal way to do this, without using the like clause. The suggestions will be only on the changes in the last part of the name.

209

asked Jun 11 '11 16:06

user794091

2 Answers

If you want also similar names (by sound) something like SOUNDEX() could help: http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_soundex

Otherwise … LIKE 'chri%' seems for me not a bad idea?

If you really want just the first characters without LIKE you can use SUBSTRING().

198

answered Sep 21 '22 01:09

flori

You could use php's metaphone() function to generate the metaphone-code for each name and store them along with the names.

<?php
print "chris" . "\t" . metaphone("chris") . "\n";
print "christian" . "\t" . metaphone("christian") . "\n";
print "christine" . "\t" . metaphone("christine") . "\n";

# prints:
# chris      XRS
# christine  XRSTN
# christian  XRSXN

You can then use a levenshtein distance algorithm (either in php [http://php.net/manual/en/function.levenshtein.php] or mysql [http://www.artfulsoftware.com/infotree/queries.php#552]) to calculate the distance between the metacodes. In my test below a distance of 2 or less seemed to indicate the level of similarity that you are seeking.

<?php
$names = array(
        array('mike',metaphone('mike')),
        array('chris',metaphone('chris')),
        array('chrstian',metaphone('christian')),
        array('christine',metaphone('christine')),
        array('michelle',metaphone('chris')),
        array('mick',metaphone('mick')),
        array('john',metaphone('john')),
        array('joseph',metaphone('joseph'))
);

foreach ($names as $name) {
        _compare($name);
}

function _compare($n) {
        global $names;
        $name = $n[0];
        $meta = $n[1];

        foreach ($names as $cname) {
                printf("The distance between $name and {$cname[0]} is %d\n",                          
                  levenshtein($meta, $cname[1]));
        }
}

answered Sep 19 '22 01:09

spuriousdata

Related questions
                            
                                MySQL - Best method to handle this hierarchical data?
                            
                                Struggling with a MySQL database of phone numbers
                            
                                mysql delete on join?
                            
                                How to insert incoming e-mail message into mySQL database? [closed]
                            
                                How can i check if MySQL and Tomcat are running?
                            
                                Incorrect key file with MySQL
                            
                                MySql - Large Single Table or Multiple Small Tables
                            
                                MySQL structure :: Friendships
                            
                                mysql query speed optimization
                            
                                Rails doesn't recreate mysql views in the test database, even when config.active_record.schema_format = :sql
                            
                                SQL CREATE TABLE Error
                            
                                What is a good blog system for use with CodeIgniter? [closed]
                            
                                Adding exactly 100 values to database using ajax
                            
                                Order by within group by in Doctrine 2
                            
                                difference between mysql and cassandra
                            
                                How can I count the number of rows returned by a MySQL Query?
                            
                                Total number of fields in all tables in database
                            
                                phpMyAdmin/MySQL export users/priviledges for later import
                            
                                Which is better? mysql's LIKE or REGEXP?
                            
                                MySQL tables on external hard drive

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With