I am building search and I am going to use javascript autocomplete with it. I am from Finland (finnish language) so I have to deal with some special characters like ä, ö and å
When user types text in to the search input field I try to match the text to data.
Here is simple example that is not working correctly if user types for example "ää". Same thing with "äl"
var title = "this is simple string with finnish word tämä on ääkköstesti älkää ihmetelkö"; // Does not work var searchterm = "äl"; // does not work //var searchterm = "ää"; // Works //var searchterm = "wi"; if ( new RegExp("\\b"+searchterm, "gi").test(title) ) { $("#result").html("Match: ("+searchterm+"): "+title); } else { $("#result").html("nothing found with term: "+searchterm); }
http://jsfiddle.net/7TsxB/
So how can I get those ä,ö and å characters to work with javascript regex?
I think I should use unicode codes but how should I do that? Codes for those characters are: [\u00C4,\u00E4,\u00C5,\u00E5,\u00D6,\u00F6]
=> äÄåÅöÖ
This will make your regular expressions work with all Unicode regex engines. In addition to the standard notation, \p{L}, Java, Perl, PCRE, the JGsoft engine, and XRegExp 3 allow you to use the shorthand \pL. The shorthand only works with single-letter Unicode properties.
\m matches only at the start of a word. That is, it matches at any position that has a non-word character to the left of it, and a word character to the right of it. It also matches at the start of the string if the first character in the string is a word character. \M matches only at the end of a word.
The only Unicode support in JavaScript regexes is matching specific code points with \uFFFF. You can use those in ranges in character classes.
There appears to be a problem with Regex and the word boundary \b
matching the beginning of a string with a starting character out of the normal 256 byte range.
Instead of using \b
, try using (?:^|\\s)
var title = "this is simple string with finnish word tämä on ääkköstesti älkää ihmetelkö"; // Does not work var searchterm = "äl"; // does not work //var searchterm = "ää"; // Works //var searchterm = "wi"; if ( new RegExp("(?:^|\\s)"+searchterm, "gi").test(title) ) { $("#result").html("Match: ("+searchterm+"): "+title); } else { $("#result").html("nothing found with term: "+searchterm); }
Breakdown:
(?:
parenthesis ()
form a capture group in Regex. Parenthesis started with a question mark and colon ?:
form a non-capturing group. They just group the terms together
^
the caret symbol matches the beginning of a string
|
the bar is the "or" operator.
\s
matches whitespace (appears as \\s
in the string because we have to escape the backslash)
)
closes the group
So instead of using \b
, which matches word boundaries and doesn't work for unicode characters, we use a non-capturing group which matches the beginning of a string OR whitespace.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With