Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filtering a list of strings based on user locale

When working on a JavaScript project with AngularJS 1.6, I have a list of strings which I'd like to filter. For instance, assume my list contains árbol, cigüeña, nido and tubo.

When filtering strings in Spanish, if I filtered for "u", I'd expect both cigüeña and tubo to appear, which would be the most natural result for a Spaniard. However, this is not the case in German - u and ü are different letters and thus a German will not want to see cigüeña on the list. So I am looking for a way to make my list filtering aware of the user's locale.

I happen to have an object containing lots of diacritics, such that:

diacritics["á"] = "a";
diacritics["ü"] = "u";
// and so on...

This is what my filtering code looks like:

function matches(word, search) {
    var cleanWord = removeDiacritics(word.toLowerCase());
    var cleanSearch = removeDiacritics(search.toLowerCase());
    return cleanWord.indexOf(cleanSearch) > -1;
}

function removeDiacritics(word) {
    function match(a) {
        return diacritics[a] || a;
    }
    return text.replace(/[^\u0000-\u007E]/g, match);
}

The above code just removes all diacritics, so I thought to make it aware of the user's locale. Thus, I changed the match() function to this:

function match(a) {
    if (diacritics[a] && a.localeCompare(diacritics[a] === 0) {
        return diacritics[a];
    }
    return a;
}

Unfortunately, this doesn't work. The localeCompare function returns the same values when comparing "u" and "ü" with the German and Spanish locales, so that was not the answer here. I've gone over the reference for the localeCompare method and tried the usage and sensitivity options, but they don't seem to help much here.

How could I tweak my code for this to work? Is there any library which can handle this properly for me?

like image 654
unpollito Avatar asked Nov 16 '17 12:11

unpollito


People also ask

How do you filter a list of strings?

Filter a list of string using filter() method. filter() method accepts two parameters. The first parameter takes a function name or None and the second parameter takes the name of the list variable as values. filter() method stores those data from the list if it returns true, otherwise, it discards the data.

How do you filter a list of words in python?

Python has a built-in function called filter() that allows you to filter a list (or a tuple) in a more beautiful way. The filter() function iterates over the elements of the list and applies the fn() function to each element. It returns an iterator for the elements where the fn() returns True .

Can you use filter on a string?

You can't use filter() on a string as it is an Array.


1 Answers

I'd go about getting the user's locale directly from the browser via navigator (src), an object representing the user agent:

var language = navigator.language;

This will assign language the locale code of the user's browser, in my case en-US. I found this site helpful for finding locale code's to test other regions of the world.

My strFromLocale function is comparable to your removeDiacritics function:

function strFromLocale(str) {
    function match(letter) {
        function letterMatch(letter, normalizedLetter) {
            var location = new Intl.Collator(language, {usage: 'search', sensitivity: 'base' }).compare(letter, normalizedLetter);
            return (location == 0)
        }
        normalizedLetter = letter.normalize('NFD').replace(/[\u0300-\u036f]/gi, "")
        if ( letterMatch(letter, normalizedLetter) ) {
            return normalizedLetter;
        } else {
            return letter;
        }
    }
    return str.replace(/[^\u0000-\u007E]/g, match);
}

Note the line with Intl.Collator (src). This line compares the diacritic with the normalized letter of the diacritic and checks the given language's alphabet for positional differences. Therefore:

/* English */
new Intl.Collator('en-US', {usage: 'search', sensitivity: 'base' }).compare('u', 'ü');
>>> 0

/* Swedish */
new Intl.Collator('sv', {usage: 'search', sensitivity: 'base' }).compare('u', 'ü');
>>> -1

/* German */
new Intl.Collator('de', {usage: 'search', sensitivity: 'base' }).compare('u', 'ü');
>>> -1

As you can see in the letterMatch function, it returns true if and only if the result of Intl.Collator is 0, indicating that there are no positional differences of the letter within the alphabet of that language meaning it is safe to replace.

With that, here are some tests of the strFromLocale function:

var language = navigator.language; // en-US
strFromLocale("cigüeña");
>>> ciguena

var language = 'sv' // Swedish
strFromLocale("cigüeña");
>>> cigüena

var language = 'de' // German
strFromLocale("cigüeña");
>>> cigüena

var language = 'es-mx' // Spanish - Mexico
strFromLocale("cigüeña");
>>> cigueña
like image 116
Cole Avatar answered Sep 21 '22 15:09

Cole