I notice sometimes users mistype their email address (in a contact-us form), for example, typing @yahho.com, @yhoo.com, or @yahoo.co instead of @yahoo.com
I feel that this can be corrected on-the-spot with some javascript. Simply check the email address for possible mistakes, such as the ones listed above, so that if the user types [email protected], a non-obtrusive message can be displayed, or something like that, suggesting that he probably means @yahoo.com, and asking to double check he typed his email correctly.
The Question is:
How can I detect -in java script- that a string is very similar to "yahoo" or "yahoo.com"? or in general, how can I detect the level of similarity between two strings?
P.S. (this is a side note) In my specific case, the users are not native English speakers, and most of them are no where near fluent, the site itself is not in English.
Here's a dirty implementation that could kind of get you some simple checks using the Levenshtein distance
. Credit for the "levenshteinenator" goes to this link
. You would add whatever popular domains you want to the domains array and it would check to see if the distance of the host part of the email entered is 1 or 2 which would be reasonably close to assume there's a typo somewhere.
levenshteinenator = function(a, b) {
var cost;
// get values
var m = a.length;
var n = b.length;
// make sure a.length >= b.length to use O(min(n,m)) space, whatever that is
if (m < n) {
var c=a;a=b;b=c;
var o=m;m=n;n=o;
}
var r = new Array();
r[0] = new Array();
for (var c = 0; c < n+1; c++) {
r[0][c] = c;
}
for (var i = 1; i < m+1; i++) {
r[i] = new Array();
r[i][0] = i;
for (var j = 1; j < n+1; j++) {
cost = (a.charAt(i-1) == b.charAt(j-1))? 0: 1;
r[i][j] = minimator(r[i-1][j]+1,r[i][j-1]+1,r[i-1][j-1]+cost);
}
}
return r[m][n];
}
// return the smallest of the three values passed in
minimator = function(x,y,z) {
if (x < y && x < z) return x;
if (y < x && y < z) return y;
return z;
}
var domains = new Array('yahoo.com','google.com','hotmail.com');
var email = '[email protected]';
var parts = email.split('@');
var dist;
for(var x=0; x < domains.length; x++) {
dist = levenshteinenator(domains[x], parts[1]);
if(dist == 1 || dist == 2) {
alert('did you mean ' + domains[x] + '?');
}
}
In addition to soundex, you may also want to have a look at algorithms for determining Levenshtein distance.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With