Ruby /[[:punct:]]/
is supposed to match all "punctuation characters". According to Wikipedia, this means /[\]\[!"#$%&'()*+,./:;<=>?@\^_`{|}~-]/
per POSIX standard.
It matches: -[]\;',./!@#%&*()_{}::"?
.
However, it does not match: =`~$^+|<>
(at least in ruby 1.9.3p194).
What gives?
The greater than symbol is in the "Symbol, Math" category, not the punctuation category. You can see this if you force the regex's encoding to UTF-8 (it defaults to the source encoding, and presumably your source is UTF-8 encoded, while my default source is something else):
2.1.2 :004 > /[[:punct:]]/u =~ '<'
=> nil
2.1.2 :005 > /[[:punct:]]/ =~ '<'
=> 0
If you force the regex to ASCII encoding (/n - more options here) you'll see it categorize '<' in punct, which I think is what you want. However, this will probably cause problems if your source contains characters outside the ASCII subset of UTF-8.
2.1.2 :009 > /[[:punct:]]/n =~ '<'
=> 0
A better solution would be to use the 'Symbol' category instead in your regex instead of the 'punct' one, which matches '<' in UTF-8 encoding:
2.1.2 :012 > /\p{S}/u =~ '<'
=> 0
There's a longer list of categories here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With