Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

No \p{L} for JavaScript Regex ? Use Unicode in JS regex [duplicate]

I nedd to add a-zA-ZáàâäãåçéèêëíìîïñóòôöõúùûüýÿæœÁÀÂÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜÝŸÆŒ x time but I find this very ugly. So I try \p{L} but it does not working in JavaScript.

Any Idea ?

my actual regex : [a-zA-ZáàâäãåçéèêëíìîïñóòôöõúùûüýÿæœÁÀÂÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜÝŸÆŒ][a-zA-ZáàâäãåçéèêëíìîïñóòôöõúùûüýÿæœÁÀÂÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜÝŸÆŒ' ,"-]*[a-zA-ZáàâäãåçéèêëíìîïñóòôöõúùûüýÿæœÁÀÂÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜÝŸÆŒ'",]+

I want to have a thing like that : [\p{L}][\p{L}' ,"-]*[\p{L}'",]+ (or smaller than the actual expression)

like image 718
charles Lgn Avatar asked May 04 '18 15:05

charles Lgn


People also ask

What is Unicode in regexp?

RegExp.prototype.unicode The unicode property indicates whether or not the " u " flag is used with a regular expression. unicode is a read-only property of an individual regular expression instance.

How to match letters and digits with regular expressions in JavaScript?

With JavaScript regular expressions, it is also possible to use character classes and especially \w or \d to match letters or digits. However, such forms only match characters from the Latin script (in other words, a to z and A to Z for \w and 0 to 9 for \d ).

Does RegexBuddy support Unicode in JavaScript?

XRegExp brings support for Unicode properties to JavaScript. RegexBuddy’s regex engine is fully Unicode-based starting with version 2.0.0. RegexBuddy 1.x.x did not support Unicode at all.

What is Unicode property escape in JavaScript?

For instance, unicode property escapes can be used to match emojis, punctuations, letters (even letters from specific languages or scripts), etc. Note: For Unicode property escapes to work, a regular expression must use the u flag which indicates a string must be considered as a series of Unicode code points. See also RegExp.prototype.unicode.


2 Answers

What you need to add is a subset of what you asked for. First you should define what set of characters you need. \pL means every letter from every language.

It's kind of ugly but doesn't affect performance and rather the best solution to get around such kind of problems in JS. ECMA2018 has a support for \pL but way far to be implemented by all major browsers.

If it's a personal taste, you could reduce this ugliness a bit:

var characterSet = 'a-zA-ZáàâäãåçéèêëíìîïñóòôöõúùûüýÿæœÁÀÂÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜÝŸÆŒ';
var re = new RegExp('[' + characterSet + ']' + '[' + characterSet + '\' ,"-]*' + '[' + characterSet + '\'",]+');

This update credits go to @Francesco:

var pCL = 'a-zA-ZáàâäãåçéèêëíìîïñóòôöõúùûüýÿæœÁÀÂÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜÝŸÆŒ';
var re = new RegExp(`[${pCL}][${pCL}' ,"-]*[${pCL}'",]+`);
console.log(re.source);
like image 151
revo Avatar answered Oct 13 '22 04:10

revo


You have XRegExp addon to support unicode letter matcher:

var unicodeWord = XRegExp("^\\pL+$"); // L: Letter

Here you can see more example matching unicode in javascript

http://xregexp.com/plugins/

like image 38
Federico Piazza Avatar answered Oct 13 '22 04:10

Federico Piazza