How to implement a SIMPLE "You typed ACB, did you mean ABC?"

Question

I know this is not a straight up question, so if you need me to provide more information about the scope of it, let me know. There are a bunch of questions that address almost the same issue (they are linked here), but never the exact same one with the same kind of scope and objective - at least as far as I know.

Context:

I have a MP3 file with ID3 tags for artist name and song title.
I have two tables Artists and Songs
The ID3 tags might be slightly off (e.g. Mikaell Jacksonne)
I'm using ASP.NET + C# and a MSSQL database

I need to synchronize the MP3s with the database. Meaning:

The user launches a script
The script browses through all the MP3s
The script says "Is 'Mikaell Jacksonne' 'Michael Jackson' YES/NO"
The user pick and we start over

Examples of what the system could find:

In the database...

SONGS = {"This is a great song title", "This is a song title"}
ARTISTS = {"Michael Jackson"}

Outputs...

"This is a grt song title" did you mean "This is a great song title" ?
"This is song title" did you mean "This is a song title" ?
"This si a song title"  did you mean "This is a song title" ?
"This si song a title"  did you mean "This is a song title" ?
"Jackson, Michael" did you mean "Michael Jackson" ?
"JacksonMichael" did you mean "Michael Jackson" ?
"Michael Jacksno" did you mean "Michael Jackson" ?

etc.

I read some documentation from this /how-do-you-implement-a-did-you-mean and this is not exactly what I need since I don't want to check an entire dictionary. I also can't really use a web service since it's depending a lot on what I already have in my database. If possible I'd also like to avoid dealing with distances and other complicated things.

I could use the google api (or something similar) to do this, meaning that the script will try spell checking and test it with the database, but I feel there could be a better solution since my database might end up being really specific with weird songs and artists, making spell checking useless.

I could also try something like what has been explained on this post, using Soundex for c#.

Using a regular spell checker won't work because I won't be using words but names and 'titles'.

So my question is: is there a relatively simple way of doing this, and if so, what is it?

Any kind of help would be appreciated.

Thanks!

Paul Sonier · Accepted Answer

What you want is a similarity factor. Essentially, you want to compare your inputs ("Micheal Jackson", for example) to your expected values ("Michael Jackson"); if you score a very high similarity value to one of your expected values, you can ask the user.

One way of doing this is to hash the expected values into a fully packed hashtable. If you get your hashing algorithm right (and yes, this is the tricky bit), each input will hash to the closest expected value; once you've found the closest expected value, you can run a similarity evaluation against the input and that expected value; if you're above a certain threshold, ask the user.

How to implement a SIMPLE "You typed ACB, did you mean ABC?"

Tags:

nlp

spell-checking

marcgg

1 Answers

Paul Sonier

Recent Activity

Donate For Us

How to implement a SIMPLE "You typed ACB, did you mean ABC?"

Tags:

nlp

spell-checking

marcgg

1 Answers

Paul Sonier

Related questions

Recent Activity

Donate For Us