Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert escaped codepoint to unicode character

Tags:

vim

utf-8

iconv

I am trying to take a chunk of JSON that has strings which contain the literal characters \u009e and I would like to convert those characters to its associated single unicode character, in this case é.

I use curl or wget to download the json which looks like:

{ "name": "Kitsun\u00e9" }

And need to translate this in Vim to:

{ "name": "Kitsuné" }

My first thought was to use Vim's iconv, but it does not evaluate the string as a single character and just returns the input.

let code = '\u00e9'
echo iconv(code, "UTF-8", "UTF-8")
" Prints \u00e9

I want to eventually use something like

%s;\\u[0-9abcdef]*;\=iconv(submatch(0),"UTF-8", "UTF-8");g
like image 614
Tom Cammann Avatar asked Jan 12 '14 15:01

Tom Cammann


People also ask

How do I decode a string with escaped Unicode?

Use a decoder that interprets the '\u00c3' escapes as unicode code point U+00C3 (LATIN CAPITAL LETTER A WITH TILDE, 'Ã').

How do I change text to Unicode?

Unicode code converter. Type or paste text in the green box and click on the Convert button above it. Alternative representations will appear in all the other boxes. You can also do the same in any grey box, if you want to target only certain types of escaped text.

How will you convert an integer to a Unicode character?

unichr() is named chr() in Python 3 (conversion to a Unicode character).

How do you escape Unicode characters?

A unicode escape sequence is a backslash followed by the letter 'u' followed by four hexadecimal digits (0-9a-fA-F). It matches a character in the target sequence with the value specified by the four digits. For example, ”\u0041“ matches the target sequence ”A“ when the ASCII character encoding is used.


1 Answers

this line works for your example:

s#\\u[0-9a-f]*#\=eval('"'.submatch(0).'"')#

or

s#\v\\u([0-9a-f]{4})#\=nr2char(str2nr(submatch(1),16))#
like image 114
Kent Avatar answered Sep 22 '22 11:09

Kent