Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String likeness algorithms

I have two strings (they're going to be descriptions in a simple database eventually), let's say they're

  1. String A: "Apple orange coconut lime jimmy buffet"
  2. String B: "Car bicycle skateboard"

What I'm looking for is this. I want a function that will have the input "cocnut", and have the output be "String A"

We could have differences in capitalization, and the spelling won't always be spot on. The goal is a 'quick and dirty' search if you will.

Are there any .net (or third party), or recommend 'likeness algorithms' for strings, so I could check that the input has a 'pretty close fragment' and return it? My database is going to have liek 50 entries, tops.

like image 968
greggorob64 Avatar asked Mar 08 '13 20:03

greggorob64


People also ask

What is similarity algorithm?

Similarity algorithms compute the similarity of pairs of nodes based on their neighborhoods or their properties. Several similarity metrics can be used to compute a similarity score.

How does levenshtein algorithm work?

The Levenshtein distance is a string metric for measuring difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one word into the other.

What is string matching algorithm in DAA?

String Matching Algorithm is also called "String Searching Algorithm." This is a vital class of string algorithm is declared as "this is the method to find a place where one is several strings are found within the larger string." Given a text array, T [1.....n], of n character and a pattern array, P [1......


1 Answers

What you’re searching for is known as the edit distance between two strings. There exist plenty of implementations – here’s one from Stack Overflow itself.

Since you’re searching for only part of a string what you want is a locally optimal match rather than a global match as computed by this method.

This is known as the local alignment problem and once again it’s easily solvable by an almost identical algorithm – the only thing that changes is the initialisation (we don’t penalise whatever comes before the search string) and the selection of the optimum value (we don’t penalise whatever comes after the search string).

like image 71
Konrad Rudolph Avatar answered Sep 28 '22 17:09

Konrad Rudolph