Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are there any Fuzzy Search or String Similarity Functions libraries written for C#? [closed]

There are similar question, but not regarding C# libraries I can use in my source code.

Thank you all for your help.

I've already saw lucene, but I need something more easy to search for similar strings and without the overhead of the indexing part.

The answer I marked has got two very easy algorithms, and one uses LINQ too, so it's perfect.

like image 219
Luca Molteni Avatar asked Sep 17 '08 14:09

Luca Molteni


People also ask

What is string similarity search?

String similarity search is a fundamental query that has been widely used for DNA sequencing, error-tolerant query auto-completion, and data cleaning needed in database, data warehouse, and data mining.

What is fuzzy matching example?

Fuzzy Matching (also called Approximate String Matching) is a technique that helps identify two elements of text, strings, or entries that are approximately similar but are not exactly the same. For example, let's take the case of hotels listing in New York as shown by Expedia and Priceline in the graphic below.

What is Fuzzy Wuzzy algorithm?

FuzzyWuzzy is a library of Python which is used for string matching. Fuzzy string matching is the process of finding strings that match a given pattern. Basically it uses Levenshtein Distance to calculate the differences between sequences.

How do I test fuzzy search?

Fuzzy searches help you find relevant results even when the search terms are misspelled. To perform a fuzzy search, append a tilde (~) at the end of the search term. For example the search term bank~ will return rows that contain tank , benk or banks .


2 Answers

Levenshtein distance implementation:

  • Using LINQ (not really, see comments)
  • Not using LINQ

I have a .NET 1.1 project in which I use the latter. It's simplistic, but works perfectly for what I need. From what I remember it needed a bit of tweaking, but nothing that wasn't obvious.

like image 168
George Mauer Avatar answered Oct 12 '22 14:10

George Mauer


you can also look at the very impressive library titled Sam's String Metrics https://github.com/StefH/SimMetrics.Net . this includes a host of algorithms.

  • Hamming distance
  • Levenshtein distance
  • Needleman-Wunch distance or Sellers Algorithm
  • Smith-Waterman distance
  • Gotoh Distance or Smith-Waterman-Gotoh distance
  • Block distance or L1 distance or City block distance
  • Monge Elkan distance
  • Jaro distance metric
  • Jaro Winkler
  • SoundEx distance metric
  • Matching Coefficient
  • Dice’s Coefficient
  • Jaccard Similarity or Jaccard Coefficient or Tanimoto coefficient
  • Overlap Coefficient
  • Euclidean distance or L2 distance
  • Cosine similarity
  • Variational distance
  • Hellinger distance or Bhattacharyya distance
  • Information Radius (Jensen-Shannon divergence)
  • Harmonic Mean
  • Skew divergence
  • Confusion Probability
  • Tau
  • Fellegi and Sunters (SFS) metric
  • TFIDF or TF/IDF
  • FastA
  • BlastP
  • Maximal matches
  • q-gram
  • Ukkonen Algorithms
like image 21
Zaffiro Avatar answered Oct 12 '22 14:10

Zaffiro