I'm looking for a small, fast (in both directions) bijective mapping between the following list of integers and a subset of the range 0-127:
0x200C, 0x200D, 0x200E, 0x200F,
0x2013, 0x2014, 0x2015, 0x2017,
0x2018, 0x2019, 0x201A, 0x201C,
0x201D, 0x201E, 0x2020, 0x2021,
0x2022, 0x2026, 0x2030, 0x2039,
0x203A, 0x20AA, 0x20AB, 0x20AC,
0x20AF, 0x2116, 0x2122
One obvious solution is:
y = x>>2 & 0x40 | x & 0x3f;
x = 0x2000 | y<<2 & 0x100 | y & 0x3f;
Edit: I was missing some of the values, particularly 0x20Ax, which don't work with the above.
Another obvious solution is a lookup table, but without making it unnecessarily large, a lookup table would require some bit rearrangement anyway and I suspect the whole task can be better accomplished with simple bit rearrangement.
For the curious, those magic numbers are the only "large" Unicode codepoints that appear in legacy ISO-8859 and Windows codepages.
This method uses multiplication in a finite field:
#define PRIME 0x119
#define OFFSET1 0x00f
#define OFFSET2 0x200c
#define OFFSET3 (OFFSET2 - OFFSET1)
#define MULTIPLIER 2
#define INVERSE 0x8d
unsigned map(unsigned n)
{
return ((n - OFFSET3) * MULTIPLIER) % PRIME;
}
unsigned unmap(unsigned m)
{
return ((m * INVERSE) + PRIME - OFFSET1) % PRIME + OFFSET2;
}
map()
converts the unicode points to the unique 7 bit numbers, and unmap()
does the reverse. Note that gcc
at least is able to compile this to x86 code which does not use any division operations, since the modulus is a constant.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With