php (fuzzy) search matching

1 Answers

Unfortunately, doing this in PHP is prohibitively expensive (high CPU and memory utilization.) However, you can certainly apply the algorithm to small data sets.

To specifically expand on how you can create a server meltdown: couple of built-in PHP functions will determine "distance" between strings: levenshtein and similar_text.

Dummy data: (pretend they're news headlines)

$titles = <<< EOF
Apple
Apples
Orange
Oranges
Banana
EOF;

$titles = explode("\n", $titles );

At this point, $titles should just be an array of strings. Now, create a matrix and compare each headline against EVERY other headline for similarity. In other words, for 5 headlines, you will get a 5 x 5 matrix (25 entries.) That's where the CPU and memory sink goes in.

That's why this method (via PHP) can't be applied to thousands of entries. But if you wanted to:

$matches = array();
foreach( $titles as $title ) {
    $matches[$title] = array();
    foreach( $titles as $compare_to ) {
        $matches[$title][$compare_to] = levenshtein( $compare_to, $title );
    }
    asort( $matches[$title], SORT_NUMERIC  );
}

At this point what you basically have is a matrix with "text distances." In concept (not in real data) it looks sort of like this table below. Note how there is a set of 0 values that go diagonally - that means that in the matching loop, two identical words are -- well, identical.

       Apple Apples Orange Oranges Banana
Apple    0     1      5      6       6
Apples   1     0      6      5       6
Orange   5     6      0      1       5
Oranges  6     5      1      0       5
Banana   6     6      5      5       0

The actual $matches array looks sort of like this (truncated):

Array
(
    [Apple] => Array
        (
            [Apple] => 0
            [Apples] => 1
            [Orange] => 5
            [Banana] => 6
            [Oranges] => 6
        )

    [Apples] => Array
        (
      ...

Anyhow, it's up to you to (by experimentation) determine what a good numerical distance cutoff might mostly match - and then apply it. Otherwise, read up on sphinx-search and use it - since it does have PHP libraries.

Orange you glad you asked about this?

146

answered Oct 23 '22 02:10

pp19dd

Related questions
                            
                                How do you parse Relationships in MWS GetMatchingProduct?
                            
                                How to make DROP INDEX IF EXISTS for mysql?
                            
                                composer install --prefer-source throwing error
                            
                                Loading custom php file in Laravel without composer dump-autoload
                            
                                Laravel Echo - Allow guests to connect to presence channel
                            
                                Why can't I use public properties in Doctrine entities?
                            
                                PHP SoapClient returns null even thought there was a response
                            
                                laravel eager loading using with() vs load() after creating the parent model
                            
                                How to install V8Js for PHP on XAMPP for Windows?
                            
                                What is the reasoning behind the refusal of PHP to accept the return types in this simple situation?
                            
                                Unable to create directory wp-content in WordPress in a windows server 2016
                            
                                Is there a way to hide "funding" messages when running composer commands?
                            
                                Only accepting certain ajax requests from authenticated users
                            
                                How to select posts with specific tags/categories in WordPress
                            
                                Is there an OpenID 2.0 plugin for Symfony?
                            
                                scripts embedded in images
                            
                                Trying out Test-Driven Development
                            
                                How can I work with dates before 1900 in PHP?
                            
                                Paypal NVP API - Keep getting error 81002
                            
                                Facebook require_login not working

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

php (fuzzy) search matching

Tags:

php

mysql

full-text-search

chris

People also ask

1 Answers

pp19dd

Recent Activity

Donate For Us