Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are there correct encodings for the backslash and tilde characters in Shift_JIS?

Or do these two characters simply not exist in Shift_JIS?

The first 128 characters in the Shift_JIS character encoding scheme match ASCII except for two: 0x5C is a Yen symbol (¥) instead of a backslash, and 0x7E is an overline () instead of a tilde.

While there's plenty of clear information about how ¥ and takeover for \ and ~, I haven't been able to find any clear statement about whether \ and ~ simply don't exist in Shift_JIS, or if there are alternate (probably multi-byte) encodings to handle these two displaced ASCII characters.

When I try to encode \ or ~ using node-iconv, it throws an error.

iconv-lite encodes both ¥ and \ as 0x5C, and both and ~ as 0x7E. When decoding, iconv-lite currently (and unfortunately) decodes 0x5C as \ and 0x7E as ~, pending response to a bug report.

like image 366
kshetline Avatar asked Oct 15 '22 13:10

kshetline


1 Answers

Character set of Shift_JIS is defined in JIS (Japanese Industrial Standard).

Character encoding Shift_JIS uses JIS X 0201 for half-width character set, and JIS X 0208 for full-width character set.

\ and ~ in the question mean the half-width backslash and tilde in ISO/IEC 8859-1(Latin-1), right? JIS X 0201 (half-width character set) doesn't contain these characters (see https://en.wikipedia.org/wiki/JIS_X_0201).

So the answer is, both of \ and ~ don't exist in Shift_JIS.

FYR, JIS X 0208 contains full-width backslash (FULLWIDTH REVERSE SOLIDUS, U+FF3C in Unicode). JIS X 0208 doesn't contain full-width tilde, but Shift_JIS equivalent in Windows (Microsoft Codepage 932) contains full-width tilde (FULLWIDTH TILDE, U+FF5E in Unicode).

like image 169
SATO Yusuke Avatar answered Oct 20 '22 14:10

SATO Yusuke