Efficient mapping for a particular finite integer set

Question

I'm looking for a small, fast (in both directions) bijective mapping between the following list of integers and a subset of the range 0-127:

0x200C, 0x200D, 0x200E, 0x200F,
0x2013, 0x2014, 0x2015, 0x2017,
0x2018, 0x2019, 0x201A, 0x201C,
0x201D, 0x201E, 0x2020, 0x2021,
0x2022, 0x2026, 0x2030, 0x2039,
0x203A, 0x20AA, 0x20AB, 0x20AC,
0x20AF, 0x2116, 0x2122

~~One obvious solution is:~~

y = x>>2 & 0x40 | x & 0x3f;
x = 0x2000 | y<<2 & 0x100 | y & 0x3f;

Edit: I was missing some of the values, particularly 0x20Ax, which don't work with the above.

Another obvious solution is a lookup table, but without making it unnecessarily large, a lookup table would require some bit rearrangement anyway and I suspect the whole task can be better accomplished with simple bit rearrangement.

For the curious, those magic numbers are the only "large" Unicode codepoints that appear in legacy ISO-8859 and Windows codepages.

caf · Accepted Answer

This method uses multiplication in a finite field:

#define PRIME 0x119
#define OFFSET1 0x00f
#define OFFSET2 0x200c
#define OFFSET3 (OFFSET2 - OFFSET1)
#define MULTIPLIER 2
#define INVERSE 0x8d

unsigned map(unsigned n)
{
    return ((n - OFFSET3) * MULTIPLIER) % PRIME;
}

unsigned unmap(unsigned m)
{
    return ((m * INVERSE) + PRIME - OFFSET1) % PRIME + OFFSET2;
}

map() converts the unicode points to the unique 7 bit numbers, and unmap() does the reverse. Note that gcc at least is able to compile this to x86 code which does not use any division operations, since the modulus is a constant.

Efficient mapping for a particular finite integer set

Tags:

c

bit-manipulation

unicode

mapping

R.. GitHub STOP HELPING ICE

1 Answers

caf

Recent Activity

Donate For Us

Efficient mapping for a particular finite integer set

Tags:

c

bit-manipulation

unicode

mapping

R.. GitHub STOP HELPING ICE

1 Answers

caf

Related questions

Recent Activity

Donate For Us