I'm being an idiot.
Someone cut and pasted some text from microsoft word into my lovely html files.
I now have these unicode characters instead of regular quote symbols, (i.e. quotes appear as <92> in the text)
I want to do a regex replace but I'm having trouble selecting them.
:%s/\u92/'/g :%s/\u5C/'/g :%s/\x92/'/g :%s/\x5C/'/g
...all fail. My google-fu has failed me.
The editor Vim supports Unicode natively. If your X or console keymap is set up to enter unicode characters via the keymap, it will work fine in Vim. Alternatively, there are two other ways of entering these characters. The slow way is just to use their hex code.
While in insert mode, you can insert special characters in Vim by pressing <ctrl-k> followed by a two-character lookup code.
From :help regexp
(lightly edited), you need to use some specific syntax to select unicode characters with a regular expression in Vim:
\%u match specified multibyte character (eg \%u20ac)
That is, to search for the unicode character with hex code 20AC, enter this into your search pattern:
\%u20ac
The full table of character search patterns includes some additional options:
\%d match specified decimal character (eg \%d123) \%x match specified hex character (eg \%x2a) \%o match specified octal character (eg \%o040) \%u match specified multibyte character (eg \%u20ac) \%U match specified large multibyte character (eg \%U12345678)
This solution might not address the problem as originally stated, but it does address a different but very closely related one and I think it makes a lot of sense to place it here.
I don't know in which version of Vim it was implemented, but I was working on 7.4 when I tried it.
When in Edit mode, the sequence to output unicode characters is: ctrl-v
u
xxxx
where xxxx
is the code point. For instance outputting the euro sign would be ctrl-v
u
20ac
.
I tried it in Command mode as well and it worked. That is, to replace all instances of "20 euro" in my document with "20 €", I'd do:
:%s/20 euro/20 <ctrl-v u 20ac>/gc
In the above <ctrl-v u 20ac>
is not literal, it's the sequence of keys that will output the €
character.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With