Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Searching in database with scrambled words in SQLite

Tags:

search

sqlite

I am wondering if its possible to search in the database with the given scrambled words.

I have a mobs table in database and it holds the name of the monster names

If given monster name is A Golden Dregon or A Golden Dfigon or A Gelden Dragon I want it to find A Golden Dragon or with the matches that close to it from database. Usually one or two letters at max is given like this as scrambled.

Is that possible with just SQL queries? Or should I build the query by parsing the given monster name?

I am using LUA for the code side.

like image 646
Valour Avatar asked Feb 10 '18 23:02

Valour


1 Answers

I have come to know this search type as a fuzzy search. I mainly program in JS and use fuse.js all the time for this kind of problem.

Fuzzy Searches are based on the Levenshtein algorithm that rate the distance of two strings. When you have this distance value you can sort or drop elements from a list based on the score.

I found the algorithm in lua here.

function levenshtein(s, t)
  local s, t = tostring(s), tostring(t)
  if type(s) == 'string' and type(t) == 'string' then
    local m, n, d = #s, #t, {}
    for i = 0, m do d[i] = { [0] = i } end
    for j = 1, n do d[0][j] = j end
    for i = 1, m do
      for j = 1, n do
        local cost = s:sub(i,i) == t:sub(j,j) and 0 or 1
        d[i][j] = math.min(d[i-1][j]+1, d[i][j-1]+1, d[i-1][j-1]+cost)
      end
    end
    return d[m][n]
  end
end

As explained in the site you compare two strings like so and get a score based on the distance of them, then sort or drop the items being search based on the scores given. As this is CPU expensive I would suggest caching or use a memoize function to store common mistakes.

  levenshtein('referrer', 'referrer') -- zero distance
  >>> 0
  levenshtein('referrer', 'referer') -- distance of one character
  >>> 1
  levenshtein('random', 'strings') -- random big distance
  >>> 6 

Got a simple version of it working in lua here I must say lua is an easy language to pick up and start coding with.

local monsters = {'A Golden Dragon', 'Goblins', 'Bunny', 'Dragoon'}

function levenshtein(s, t)
  local s, t = tostring(s), tostring(t)
  if type(s) == 'string' and type(t) == 'string' then
    local m, n, d = #s, #t, {}
    for i = 0, m do d[i] = { [0] = i } end
    for j = 1, n do d[0][j] = j end
    for i = 1, m do
      for j = 1, n do
        local cost = s:sub(i,i) == t:sub(j,j) and 0 or 1
        d[i][j] = math.min(d[i-1][j]+1, d[i][j-1]+1, d[i-1][j-1]+cost)
      end
    end
    return d[m][n]
  end
end

--Fuzzy Search Returns the Best Match in a list
function fuzzySearch(list, searchText)
    local bestMatch = nil;
    local lowestScore = nil;

    for i = 1, #list do
        local score = levenshtein(list[i], searchText)
        if lowestScore == nil or score < lowestScore then
            bestMatch = list[i]
            lowestScore = score
        end 
    end

    return bestMatch
end

print ( fuzzySearch(monsters, 'golen dragggon') )
print ( fuzzySearch(monsters, 'A Golden Dfigon') )
print ( fuzzySearch(monsters, 'A Gelden Dragon') )

print ( fuzzySearch(monsters, 'Dragooon') ) --should be Dragoon
print ( fuzzySearch(monsters, 'Funny') ) --should be Bunny
print ( fuzzySearch(monsters, 'Gob') ) --should be Goblins

Output

A Golden Dragon
A Golden Dragon
A Golden Dragon
Dragoon
Bunny
Goblins

For SQL

You can try to do this same algorithm in T-SQL as talked about here.

In SQLlite there is an extension called editdist3 which also uses this algorithm the docs are here.

like image 135
Michael Warner Avatar answered Oct 04 '22 06:10

Michael Warner