Recognizing similarity in strings

Tags:

I'm working on a system which allows imported files to be localized into other languages.

This is mostly a private project to get the hang of MVC3, EntityFramework, LINQ, etcetera. Therefore I like doing some crazy things to spice up the end result, one of those things would be the recognition of similar strings.

Imagine you have the following list of strings - borrowed from a game I've worked with in the past:

Megabeth: Holy Roller Uniform - Includes Head, Torso, and Legs
Megabeth: Holy Roller Uniform Head
Megabeth: Holy Roller Uniform Legs
Megabeth: Holy Roller Uniform Torso
Megabeth: PAX East 2012 Uniform - Includes Head, Torso, and Legs
Megabeth: PAX East 2012 Uniform Head
Megabeth: PAX East 2012 Uniform Legs
Megabeth: PAX East 2012 Uniform Torso

As you can see, once users have translated the first 4 strings, the following 4 share a lot of similarities, in this case:

Megabeth
Uniform
Includes Head, Torso, and Legs
Head
Legs
Torso

Consider the first 4 strings are indeed already translated, when a user selects the 5th string from the list, what kind of algorithm or technique can I use to show the user the 1st string (and potentially others) under a sub-header of "Similar strings"?

Edit - A little comment on the Levenshtein Distance: I'm currently targeting 10k strings in the database. Levenshtein Distance compares string per string, so in this case 10k x (10k -1) possible combinations. How would I approach this in a feasible way? Is there a better solution that this particular algorithm?

551

asked Oct 22 '12 20:10

Lennard Fonteijn

1 Answers

You could look into the Levenshtein Distance. Those below a certain threshold will be considered similar. Two strings that are identical will have a distance of zero.

There's a C# implementation, amongst other languages, on Rosetta Code.

answered Oct 11 '22 22:10

keyboardP

Related questions
                            
                                Common Application Data Path in windows installer
                            
                                Allow implementing classes to use themselves as types
                            
                                Reorder StackPanel Children Drag & Drop
                            
                                Concatenated Where clause with array of strings
                            
                                How to use a variable as type
                            
                                SQL update a column that lost order?
                            
                                Using the MongoDB C# Driver: Wrapped or Un-Wrapped?
                            
                                How to initialize object with CodeDOM?
                            
                                fastJson Deserialize Unhandled Exception
                            
                                One instance application over multiple Windows user accounts
                            
                                Why a binding's StringFormat didn't use a current culture?
                            
                                Parse expression (with custom functions and operations)
                            
                                Can MEF Export/Import static classes?
                            
                                Outer Glow Effect for TextBlock in WinRT
                            
                                How to automatically upload file after file has been chosen
                            
                                Updating application using InstallShield in VS2012
                            
                                Has this usage of async / await in C# been discovered before? [closed]
                            
                                CSharpCodeProvider Compilation Performance
                            
                                Manage NuGet Packages revert jquery to old version [duplicate]
                            
                                Why exactly is void async bad?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Recognizing similarity in strings

Tags:

c#

asp.net-mvc-3

entity-framework

localization

similarity

Lennard Fonteijn

People also ask

1 Answers

keyboardP

Recent Activity

Donate For Us