Unicode specifies that \X
should match an "extened grapheme cluster" - for instance a base character followed by zero or more combining characters. (I believe this is a simplification but may suffice for my needs.)
I'm pretty sure at least Perl supports \X
in its regular expresions.
But Vim defines \X
to match a non-hexadecimal digit.
Does Vim have any equivalent to \X
or any way to match a Unicode extended grapheme cluster?
Vim does have a concept of combining or "composing" characters, but its documentation does not cover whether or how they are supported in regular expressions.
It seems that Vim does not yet support this directly, but I am still interested in a workaround where a search will highlight all characters which include a combining character in at least the most basic range of U+0300
to U+0364
.
You can search for all characters and ignore composing characters with \Z
. Or you can search for a range of Unicode characters. Read :help /[]
from more information on both.
The last post here may offer some more help:
http://vim.1045645.n5.nabble.com/using-regexp-to-search-for-Unicode-code-points-and-properties-td1190333.html
But Vim's regex does not have a character class like Perl.
If your vim installation is compiled with perl support, you may be able to run:
:perldo s/\X/replacement/g
I installed vim-nox
on debian (which contains perl support), and matching \X
in with perldo
does indeed work, but I'm not sure it will do what you want, since all normal characters are also matched and it doesn't seem like perldo
will get you highlighting in vim.
While it's not perfect, if you can get perl support, you can use unicode blocks and categories. Which means you can use \p{Block: Combining_Diacritical_Marks}
or \p{Category: Nonspacing_Mark}
to at least detect certain characters, though you still won't get highlighting.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With