I'm trying to collect all dash-signs to use it while analyzing raw text data. I've found that Unicode regexp \p{Pd} should match all cases, but after all, it turned out that this character − doesn't match!
Here is more info about this char: https://www.fileformat.info/info/unicode/char/2212/index.htm
Is it a bug or a feature? Practically it's not useful stuff.
The Unicode character U+2212 MINUS SIGN is a math-related symbol, and is probably not considered as a punctuation mark; for instance, it is matched by \p{Math} but not by \p{Punctuation} (which includes \p{Dash_Punctuation}).
You may want to try using \p{Dash} instead, and check whether it covers all your needs or not...
Ref: Properties for U+2212
Edit:
Here is an "official" list of all the characters having a Dash Unicode property: https://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:Dash=Yes:], including the U+2212 MINUS SIGN character.
In Unicode 12.0, the JavaScript regular expression:
/\p{Dash}/u
would be equivalent to:
/[\u002D\u058A\u05BE\u1400\u1806\u2010\u2011\u2012\u2013\u2014\u2015\u2053\u207B\u208B\u2212\u2E17\u2E1A\u2E3A\u2E3B\u2E40\u301C\u3030\u30A0\uFE31\uFE32\uFE58\uFE63\uFF0D]/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With