I have a regex which can find all 4 byte unicode characters in a string. I would like to make the following compatible with all popular browsers.
The following code works fine in Chrome and Firefox, but Safari throws "Invalid regular expression: range out of order in character class"
var match = 'aaa😚aaa'.match(/[\u{10000}-\u{10FFFF}]/gu);
So my questions is how should I change the regexp to be able to match all 4 byte unicode characters in a string and without the use of the unicode feature of regex.
To match a specific Unicode code point, use \uFFFF where FFFF is the hexadecimal number of the code point you want to match. You must always specify 4 hexadecimal digits E.g. \u00E0 matches à , but only when encoded as a single code point U+00E0. Perl, PCRE, Boost, and std::regex do not support the \uFFFF syntax.
Each character is encoded as 1 to 4 bytes. The first 128 Unicode code points are encoded as 1 byte in UTF-8. These code points are the same as those in ASCII CCSID 367. Any other character is encoded with more than 1 byte in UTF-8.
U (Unicode dependent), and re. X (verbose), for the entire regular expression. (The flags are described in Module Contents.) This is useful if you wish to include the flags as part of the regular expression, instead of passing a flag argument to the re.
\u000d — Carriage return — \r. \u2028 — Line separator. \u2029 — Paragraph separator.
Safari does not support ES6 regular expression syntax. All you can do is transpile the regex to conform with the ES5 regex syntax:
console.log('aaa😚aaa'.match(/(?:[\uD800-\uDBFF][\uDC00-\uDFFF])/g));
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With