Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using JavaScript to perform text matches with/without accented characters

I am using an AJAX-based lookup for names that a user searches in a text box.

I am making the assumption that all names in the database will be transliterated to European alphabets (i.e. no Cyrillic, Japanese, Chinese). However, the names will still contain accented characters, such as ç, ê and even č and ć.

A simple search like "Micic" will not match "Mičić" though - and the user expectation is that it will.

The AJAX lookup uses regular expressions to determine a match. I have modified the regular expression comparison using this function in an attempt to match more accented characters. However, it's a little clumsy since it doesn't take into account all characters.

function makeComp (input) {     input = input.toLowerCase ();     var output = '';     for (var i = 0; i < input.length; i ++)     {         if (input.charAt (i) == 'a')             output = output + '[aàáâãäåæ]'         else if (input.charAt (i) == 'c')             output = output + '[cç]';         else if (input.charAt (i) == 'e')             output = output + '[eèéêëæ]';         else if (input.charAt (i) == 'i')             output = output + '[iìíîï]';         else if (input.charAt (i) == 'n')             output = output + '[nñ]';         else if (input.charAt (i) == 'o')             output = output + '[oòóôõöø]';         else if (input.charAt (i) == 's')             output = output + '[sß]';         else if (input.charAt (i) == 'u')             output = output + '[uùúûü]';         else if (input.charAt (i) == 'y')             output = output + '[yÿ]'         else             output = output + input.charAt (i);     }     return output; } 

Apart from a substitution function like this, is there a better way? Perhaps to "deaccent" the string being compared?

like image 435
Philip Avatar asked Apr 18 '11 09:04

Philip


People also ask

How do you change an accented character to a regular character?

replace(/[^a-z0-9]/gi,'') . However a more intuitive solution (at least for the user) would be to replace accented characters with their "plain" equivalent, e.g. turn á , á into a , and ç into c , etc.

How do I remove the accented character in Java?

string = string. replaceAll("[^\\p{ASCII}]", "");


1 Answers

There is a way to “"deaccent" the string being compared” without the use of a substitution function that lists all the accents you want to remove…

Here is the easiest solution I can think about to remove accents (and other diacritics) from a string.

See it in action:

var string = "Ça été Mičić. ÀÉÏÓÛ"; console.log(string);  var string_norm = string.normalize('NFD').replace(/[\u0300-\u036f]/g, ""); console.log(string_norm);
like image 64
Takit Isy Avatar answered Sep 23 '22 23:09

Takit Isy