Javascript Unicode: same letters but different unicode

Question

I've got to send text to a printservice, which only accepts certain types of special characters, i.e. ï. My client somehow inputs text in such a way that the letters look the same, but have a different underlying unicode symbol, and are thereby not processed correctly by the printservice. Example:

Mine: ï (unicode \u00EF)
Theirs: ï (unicode \u0069\u0308), copy pasting the 2 symbols in chrome bar for example, will show that it actually looks the same in textarea's)

How can I convert all special characters from "their style" to "my style" (dutch keyboard layout on Windows)? I guess this has something to do with OS or keyboard layouts, but I cannot find a list stating the differences, or anything related to this issue. Does someone has a suggestion how to proceed?

georg · Accepted Answer

As correctly pointed out in the comments, there are two ways (or "normalization forms") to represent accented characters in unicode:

with a dedicated symbol (\u00EF == ï)
with a composition of the basic letter + accent (i.e. i + ¨ == i + \u0308 == ï)

ES6 adds a dedicated function, which converts strings between normalization forms : String.normalize.

// convert one-char ("composed") to multiple-chars ("decomposed") form:
escape("\u00EF".normalize("NFD"))  
> "i%u0308"

// convert decomposed form to composed:
escape("i\u0308".normalize("NFC"))  
> "%EF"

If your system doesn't support normalize yet, look around for shims.

Javascript Unicode: same letters but different unicode

Tags:

javascript

unicode

keyboard-layout

user3136936

1 Answers

georg

Recent Activity

Donate For Us

Javascript Unicode: same letters but different unicode

Tags:

javascript

unicode

keyboard-layout

user3136936

1 Answers

georg

Related questions

Recent Activity

Donate For Us