Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP / SQL - Improving search feature / fuzzy search

Tags:

sql

php

search

I am trying to create a product search for my site, where a user can search for products in multiple languages and (hopefully) get fuzzy search results if there is no exact match.

  • I have a pro_search table that has columns id, pro_id, en, de, es, fr, it.
  • The pro_id column refers to the id of the products in their own table.
  • The en, de, es, fr, it columns have the translated meta of each product in various languages.
  • The meta is just keywords seperated by spaces
  • $term is the search term.
  • $lang refers to the users chosen language

So first I do a basic 'LIKE' SQL query to see if there are matches, if there are no results from this I query all the products and create an array sorted by their similarity using the similar_text() function

For example I search 'shirt' this is fine if the meta for this product just includes the word 'shirt', but if the meta includes 'blue branded tshirt' this is being more descriptive and gives the user a chance to search by brand but means that the search will more than likely go fuzzy rather than be found with a LIKE SQL query.

This is kind of working but I was wondering how this could be improved, is there a better way of searching or how do people normally do it? Should I be splitting the meta into each individual keywords and try to see how many words match rather than matching the term to the whole meta?

    $ids = [];

    $params = ['%'.$term.'%'];
    $sql = "SELECT * FROM pro_search WHERE $lang LIKE ?";
    $stmt = DB::run($sql,$params);

    $count = $stmt->rowCount();
    if($count > 0){

        // product search
        while ($row = $stmt->fetch(PDO::FETCH_ASSOC)){
            $id = $row["pro_id"];
            array_push($ids,$id);
        }
        show_products($ids);

    }else{

        // product fuzzy search
        $sql = "SELECT * FROM pro_search";
        $stmt = DB::run($sql);
        while ($row = $stmt->fetch(PDO::FETCH_ASSOC)){
            $id = $row["pro_id"];
            $result = $row[$lang];
            similar_text($term,$result,$similarity);
            $similar_array[$similarity][] = $id;
        }

        $closest_match = array_keys($similar_array);
        rsort($closest_match);
        $match_count = count($closest_match);

        for($i=0; $i<$match_count; $i++){
            foreach($similar_array[$closest_match[$i]] as $id){
                array_push($ids,$id);
            }
        }
        show_products($ids);
    }

I have asked similar questions before and people have pointed me to different ways of comparing the term against the meta (such as levenshtein), but everything I've seen has been comparing two simple words (like apples and oranges) and this just isn't good enough for a real life application with thousands of products and a user could search for literally anything (as in $term='literally anything';)

Key Questions:

  • Should my meta have just the product name or multiple relevant keywords (too many keywords means an individual word is less similar to the whole)?
  • If I have multiple keywords in the meta should I be taking each individual keyword and comparing it against the search term?
  • Also would it be possible to maybe have negative keywords for individual products.
like image 543
Paddy Hallihan Avatar asked Mar 13 '19 16:03

Paddy Hallihan


People also ask

What is a fuzzy search?

Lastly, the fuzzy search is an interesting one. A “like search” must contain the search term in one way or another, but a fuzzy search returns all results that are “close enough”. P.S. Another interesting part about fuzzy search is that we can get a “search relevance score” with it, which is a number from 0 to 1.

How to create a search and display form using PHP?

In the simplest design, a “search and display results” only involves 2 steps. Create a simple HTML search form – With a text box, submit button, and pointed to a PHP search script. In the PHP search script, do a SELECT * FROM `TABLE` WHERE `FIELD` LIKE '%SEARCH%' SQL query and output the results in HTML.

How to use PHP search scripts to generate HTML Search results?

In the PHP search script, do a SELECT * FROM `TABLE` WHERE `FIELD` LIKE '%SEARCH%' SQL query and output the results in HTML. But just how is this done exactly? Let us walk through an example in this guide – Read on!

How to connect search engine to database using PHP?

1 Open search.php 2 Start php (<?php ?>) 3 Connect to a database (read comments in following code)


2 Answers

You are looking for Full-Text Searches WITH QUERY EXPANSION

MySQL supports text searching by using the LIKE operator and regular expression. However, when the text column is large and the number of rows in a table is increased, using these methods has some limitations:

  • Performance: MySQL has to scan the whole table to find the exact text based on a pattern in the LIKE statement or pattern in the regular expressions.
  • Flexible search: with the LIKE operator and regular expression searches, it is difficult to have a flexible search query e.g., to find product whose description contains car but not classic.
  • Relevance ranking: there is no way to specify which row in the result set is more relevant to the search terms.

Because of these limitations, MySQL extended a very nice feature so-called full-text search. Technically, MySQL creates an index from the words of the enabled full-text search columns and performs searches on this index. MySQL uses a sophisticated algorithm to determine the rows matched against the search query.

To do that, the columns that will be used for search must be in TEXT type and index of type FULLTEXT, index can be given using ALTER TABLE or CREATE INDEX and if you are using phpMyAdmin to manage your databases, you can do that by going to the Structure of that table, then click on More under Action of that column and choose Fulltext.

After that you can performe a search using MATCH AGAINST syntax. MATCH() takes the columns to be searched. AGAINST takes a string to search for, and an optional modifier that indicates what type of search to perform.

Full-Text Searches WITH QUERY EXPANSION:

In some cases, users want to search for information based on the knowledge that they have. Users use their experience to define keywords to search for information, and typically those keywords are too short.

To help users to find information based on the too-short keywords, MySQL full-text search engine introduces a concept called query expansion.

The query expansion is used to widen the search result of the full-text searches based on automatic relevance feedback (or blind query expansion). Technically, MySQL full-text search engine performs the following steps when the query expansion is used:

  • First, MySQL full-text search engine looks for all rows that match the search query.
  • Second, it checks all rows in the search result and finds the relevant words.
  • Third, it performs a search again based on the relevant words instead of the original keywords provided by the users.

The following example shows you how to search for a product whose product name or meta contains at least one word (shirt tshirt).

SELECT * FROM products WHERE MATCH(product_name,product_meta) AGAINST('shirt tshirt' WITH QUERY EXPANSION)

You can read more info in MYSQL document (the link at the beginning of the answer) and here

Also don't miss How Fine-Tuning MySQL Full-Text Search

like image 118
user2203703 Avatar answered Sep 21 '22 15:09

user2203703


You can think a bit differently if you are still designing the system. In terms of search just do exact search and do in db as suggested earlier because that is much faster - but "learn from each interaction".

  • User types in some term
  • You do exact search, if found great.
  • if not do fuzzy search for each part of the term entered. Still not found, you do soundex. You are trying to find something! But present a long list to the user to filter through.
  • Eventually user selects one. As soon as they do, you add the term they entered to the product they selected.

So basic idea being that you are learning from each interaction and enriching your search set. Also whenever a term is user and user actually clicks on your item, you keep a count on that term-product association as your confidence in that term-product association improved.

Similarly whenever you are present an option, very easily user should be able to say "Not this" and then guide them through a pre-selected hierarchy of your items and eventually when they select one, you save their search term to that product.

So over a few months if you are getting enough users you will have a rich organic search term dataset for your product category with confidence level on each term.

like image 38
Saad Ahmad Avatar answered Sep 20 '22 15:09

Saad Ahmad